Upload
valeria-pesce
View
353
Download
2
Embed Size (px)
Citation preview
Dataset descriptionDCAT and other
vocabulariesValeria Pesce
Secretariat of Global Forum on Agricultural Research (GFAR)and
Secretariat of the Global Open Data for Agriculture and Nutrition (GODAN) initiative
DRTCISI-ICSUCODATAInternational Workshop on Open Data Repositories
Data and datasets
WikipedialdquoData are values of qualitative or quantitative variables belonging to a set of items Data as an abstract concept can be viewed as the lowest level of abstraction from which information and then knowledge are derivedrdquo
Wikipedia A dataset (or data set) is a collection of data(Narrow definition) hellipthe contents of a single database table or a single statistical data matrix where each column of the table represents a particular variable and each row corresponds to a given member of the dataset in question [hellip]Nontabular datasets can take the form of marked up strings of characters such as an XML file
W3C Government Linked Data Working Group (DCAT vocabulary) httpwwww3orgTRvocab-dcatclass--dataset
A collection of data published or curated by a single source and available for access or download in one or more formats
DATA
DATASETS
Single dataset
bull As long as you consider a dataset alone you may not need structured metadata about the dataset for further re-use as long as the dataset uses sector standards and is documented in some way
bull When managing a single dataset or a few homogeneous datasets between colleagues it may be enough to use sector-specific standards (eg Multi-crop descriptors and Darwin Core for germplasm INSPIRE for soilgeo ldquoMinimum Setrdquo recommendations for observations sector code lists vocabularies) or application-specific standards
Datasets in repositories
But what happens when you have a big data repository with heterogeneous datasets
Or you have few datasets but you want to contribute them to a huge data repositorycatalog where many other datasets are
Users will want to find your dataset among many others possibly together with datasets with similar data using the same standards measures syntax
They will use tools to search for datasets
Then dataset metadata (or a machine-readable dataset description)
becomes important
Why machine-readable descriptions
bull Data will be re-used by applicationsOthers will search and make use of your data through tools
Datasets have to be found by applications Datasets have to be understood by applications
bull Datasets should be managed in data repositories data catalogs Data catalogs have to provide enough dataset metadata to applications to allow them to find and understand datasets
bull Data catalogs themselves are implemented as applications so they need machine-readable dataset metadata
Ref ID URI Dimension2 Dimension3 Dimension4
Value11 Value12 Value13 Value14
Value21 Value22 Value23 Value24
Value31 Value32 Value33 Value34
Value41 Value42 Value43 Value44
Dataset
Datum
Record ldquomemberrdquo observation
File system
Data repository
DatasetRef Matadata1 Metadata2 Metadata3
Ref1 Address1 Value12 Value13
Ref2 Address2 Value22 Value23
Data catalog (also a dataset)
Tabular only for the sake of simplification it could be triples or other data structures
CatalogID Value1 Value2 Value3
Search ExportData type
We only focus on the data catalog level
Dataset metadata
So what metadata do applications need to find in data catalogs
Dataset metadata for applications
1) General metadata about the dataset1) identifier(s)2) who is responsible for it3) when and where the data were collected4) relations to organizations persons publications software
projects fundinghellip5) the conditions for re-use (rights licenses)6) provenance versions7) the specific coverage of the dataset (type of data thematic
coverage geographic coverage)
8) time and space slices subsets 9) the ldquodimensionsrdquo and ldquovariablesrdquo covered by the dataset10) the semantics of the dimensions (units of measure time
granularity syntax reference taxonomies)
WHY dimensions and semantics
Example of search by researcher
Irsquom doing research on plant phenotypes give me all datasets of crop phenotypic data that include the dimensions of time geographic location and height plus units of measure used for time and height where geographic location is expressed as coordinates (because my software only processes coordinates)
Data seralizations
bull The metadata above refer to the collection as a whole but additional technical metadata refer to the different ldquoserializationsrdquo of the datahellip
bull In many dataset description models the metadata about the data serialization is attached to the dataset
bull In other dataset description models information about the data serializations is not considered inherent to the nature and content of the data collection so it is not attached to the dataset but rather to related entities called ldquodistributionsrdquo
Distribution metadata for applications
Applications have to find metadata about the actual ldquoserializationsrdquo or ldquodistributionsrdquo of the dataset to understand
1 Where to retrieve it URL (data dump servicehellip)
2 the necessary technical specifications to retrieve and parse a
distribution of the dataset
- format (file format data format)
- protocol API parametershellip
And if different for different distributions again3 the conditions for re-use (rights licenses)
4 the semantics of the dimensions (units of measure time granularity syntax reference taxonomies) if different semantics for different distributions
WHY protocols and API params
Example
Some datasets are available behind a service eg RDF datasets are often retrieved as subsets of an RDF store through SPARQL queries or national research institutes provide access to datasets behind a web service accepting parameters to filter the datasets
Use case I have an application that can fetch from several SOAP web services (protocol) at once automatically if it knows the parameters required by the SOAP service and the required syntax and type of the parameters
General issue with all metadata
Standardization of the values eg for ldquothematic coveragerdquo or ldquodimensionsrdquo of datasets ldquoformatrdquo or ldquoprotocol usedrdquo of distributions etc
- The value should be standardized possibly a URI
- The value should be part of an authority list code list
Andhellip There is no authority ldquovalue vocabularyrdquo or code list for many of these values
No out-of-the-box solution
bull Do existing data catalog tools normally cover these metadataNO
bull Do existing metadata vocabularies (RDF and not) cover all these metadata Or do they adopt the same modelNO
BUT by using even basic metadata to describe datasets in data catalogs ldquopublishers increase discoverability and enable applications to consume metadata from multiple catalogs This further enables decentralized publishing of catalogs and facilitates federated dataset search across sitesrdquo(from W3C page on ldquoBest Practices for Publishing Linked Datardquo)
Dataset description vocabularies
Letrsquos see if the main vocabularies to describe datasets provide themetadata we think are needed
for full interoperability
Semantic interoperability
In this presentation we cover only RDF vocabularies with special focus on semantic interoperability
Dataset metadata have been managed in several ways before semantic technologies see NetCDF or HDF5 structures and various hierarchical array-based structures used especially in observations datasets ndashincluding dataset metadata at the top and data arrays below
Dataset description vocabularies
bull DCAT vocabularybull RDF vocabulary for describing any dataset
bull Datasets can be standalone or part of a ldquocatalogrdquo
bull Metadata about dataset (collection) and related distributions
bull DataCube vocabularybull RDF vocabulary for describing statistical datasets
bull Useful for attaching metadata about the ldquodata structurerdquo of a dataset
bull VOID vocabularybull RDF vocabulary for expressing metadata about RDF datasets
bull Useful especially for metadata related to RDF data services
Definition of ldquodatasetrdquo in DCAT
Definition given by the W3C Government Linked Data Working Group
A dataset is ldquoa collection of data published or curated by a single source and available for access or download in one or more formatsrdquo
The ldquoinstancesrdquo of the dataset ldquoavailable for access or download in one or more formatsrdquo are called ldquodistributionsrdquo A dataset can have many distributions
Examples of distributions include a downloadable CSV file an API or an RSS feed
c
DCAT model
1) identifier(s)2) who is responsible for it3) when and where the data were
collected4) relations to organizations persons
publications software projects fundinghellip NO
5) the conditions for re-use (rights licenses)
6) provenance version NO7) coverage of the dataset8) dimensions and semantics NO9) slices subsets NO10) URL11) Format12) Protocols parameters NO
DCAT and DCAT-AP
The DCAT Application profile for data portals in
Europe (DCAT-AP) is an extension of DCAT
It combines DCAT with the W3C Asset Description Metadata
Schema (ADMS) vocabulary plus classes and properties from
Dublin Core SKOS and Vcard in an Application profile
The elaboration of the DCAT-AP was a joint initiative of DG
CONNECT the EU Publications Office and the ISA Programme
A diagram of the full DCAT-AP specification is on the next slide
Full DCAT AP
versions
rights and provenance
standards
rights
format
relation
1) who is responsible for it MORE2) relations to organizations projects publications funding Partly
3) the conditions for re-use (rights licenses) MORE4) provenance version YES5) protocols parameters NO6) dimensions and semantics NO
More than DCAT
Limitations of DCAT
It doesnrsquot cover
bull semantic relations to organisations persons software projects fundinghellip
bull dimensions and variables and syntax semantics of dimensions and variables
bull protocols and parameters for datasets available through APIs
bull time and space slices subsets
Combining DCAT with other vocabularies
ldquoOther complementary vocabularies may be used together with DCAT to provide more detailed format-specific information For example properties from the VoID vocabulary can be used if that dataset is in RDF formatrdquo
(from the DCAT specification)
DataCube structure definition
A cube is organized according to a set of dimensions attributes and measures
bull The dimension components serve to identify the observed dimensions (eg time geographic region gender elevationhellip)
bull The measure components represent the phenomenon being observed (eg life expectancy)
bull The attribute components enable specification of the units of measure any scaling factors and metadata such as the status of the observation (eg estimated provisional)
DataCube model for dataset structure
This part of the model could be re-used for describing the dimensions of any dataset also non-statistical
1) dimensions and semantics YES2) slices subsets YES
More than DCAT-AP
VOID model
c
dctlicensewvnorms wvwaiver
3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES
More than DCAT-AP
Complementing DCAT
bull For dimensions semantics of dimensions slicingbull DataCube
bull DDI
bull For API aspectsbull VOID (linked data)
bull Web services descriptions (Hydra (WSDL WADL))
bull For relations to organizations projects publications fundinghellipbull CERIF for datasets
bull VIVO Datastar
Many vocabulariesVocabularies with relations to DCAT or same model
bull DCAT-AP and other extensions
bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT
bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)
bull Schemaorg
Other vocabulariesbull DataCite and re3data
bull CERIF for datasets
bull VIVO Datastar
bull INSPIRE
Examples of application of DCAT
bull CKAN data catalog tool(more in your workshop)
bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances
bull CIARD RING federated data catalogmanaged by GFAR
CIARD RING
Datasets in the RING dataset hub
bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)
bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)
bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)
bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary
a ldquoRING DCAT profilerdquo will be published
Sample dataset record
RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt
ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt
ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt
ltrdftype rdfresource=httpschemaorgDatasetgt
ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt
ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt
ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt
ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt
ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltdctpublisher rdfresource=httpringciardnetnode19510gt
ltschemapublisher rdfresource=httpringciardnetnode19510gt
ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt
ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt
ltdctspatialgtNationalltns1spatialgt
ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt
ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt
ltdctconformsTo rdfresource=httpringciardnetnode19239gt
ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt
ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt
ltdcttype rdfresource=httpringciardnettaxonomy_term81gt
ltdcatcatalog rdfresource=httpringciardnetnode19436gt
ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt
ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt
ltrdfDescriptiongt
References1 Issues and Recommendations Associated with Distributed Computation and Data
Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ
2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)
3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)
4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf
5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page
6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki
Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat
bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-
profile-data-portals-europe-final
bull DataCube httppurlorglinked-datacube
bull VOID httprdfsorgnsvoid-guide
bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml
bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology
bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables
bull CKAN httpckanorg
bull Dataverse httpdataverseorg
bull Datahub httpdatahubio
bull DataCite httpsearchdataciteorguiq=subject3Aagriculture
bull Re3data httpwwwre3dataorg
bull OpenAIRE httpswwwopenaireeu
bull CIARD RING httpringciardinfo
Dataset description and DCAT
DRTCISI-ICSUCODATAInternational Workshop on
Open Data Repositories
Thank you
Valeria Pesce
valeriapescefaoorg
Data and datasets
WikipedialdquoData are values of qualitative or quantitative variables belonging to a set of items Data as an abstract concept can be viewed as the lowest level of abstraction from which information and then knowledge are derivedrdquo
Wikipedia A dataset (or data set) is a collection of data(Narrow definition) hellipthe contents of a single database table or a single statistical data matrix where each column of the table represents a particular variable and each row corresponds to a given member of the dataset in question [hellip]Nontabular datasets can take the form of marked up strings of characters such as an XML file
W3C Government Linked Data Working Group (DCAT vocabulary) httpwwww3orgTRvocab-dcatclass--dataset
A collection of data published or curated by a single source and available for access or download in one or more formats
DATA
DATASETS
Single dataset
bull As long as you consider a dataset alone you may not need structured metadata about the dataset for further re-use as long as the dataset uses sector standards and is documented in some way
bull When managing a single dataset or a few homogeneous datasets between colleagues it may be enough to use sector-specific standards (eg Multi-crop descriptors and Darwin Core for germplasm INSPIRE for soilgeo ldquoMinimum Setrdquo recommendations for observations sector code lists vocabularies) or application-specific standards
Datasets in repositories
But what happens when you have a big data repository with heterogeneous datasets
Or you have few datasets but you want to contribute them to a huge data repositorycatalog where many other datasets are
Users will want to find your dataset among many others possibly together with datasets with similar data using the same standards measures syntax
They will use tools to search for datasets
Then dataset metadata (or a machine-readable dataset description)
becomes important
Why machine-readable descriptions
bull Data will be re-used by applicationsOthers will search and make use of your data through tools
Datasets have to be found by applications Datasets have to be understood by applications
bull Datasets should be managed in data repositories data catalogs Data catalogs have to provide enough dataset metadata to applications to allow them to find and understand datasets
bull Data catalogs themselves are implemented as applications so they need machine-readable dataset metadata
Ref ID URI Dimension2 Dimension3 Dimension4
Value11 Value12 Value13 Value14
Value21 Value22 Value23 Value24
Value31 Value32 Value33 Value34
Value41 Value42 Value43 Value44
Dataset
Datum
Record ldquomemberrdquo observation
File system
Data repository
DatasetRef Matadata1 Metadata2 Metadata3
Ref1 Address1 Value12 Value13
Ref2 Address2 Value22 Value23
Data catalog (also a dataset)
Tabular only for the sake of simplification it could be triples or other data structures
CatalogID Value1 Value2 Value3
Search ExportData type
We only focus on the data catalog level
Dataset metadata
So what metadata do applications need to find in data catalogs
Dataset metadata for applications
1) General metadata about the dataset1) identifier(s)2) who is responsible for it3) when and where the data were collected4) relations to organizations persons publications software
projects fundinghellip5) the conditions for re-use (rights licenses)6) provenance versions7) the specific coverage of the dataset (type of data thematic
coverage geographic coverage)
8) time and space slices subsets 9) the ldquodimensionsrdquo and ldquovariablesrdquo covered by the dataset10) the semantics of the dimensions (units of measure time
granularity syntax reference taxonomies)
WHY dimensions and semantics
Example of search by researcher
Irsquom doing research on plant phenotypes give me all datasets of crop phenotypic data that include the dimensions of time geographic location and height plus units of measure used for time and height where geographic location is expressed as coordinates (because my software only processes coordinates)
Data seralizations
bull The metadata above refer to the collection as a whole but additional technical metadata refer to the different ldquoserializationsrdquo of the datahellip
bull In many dataset description models the metadata about the data serialization is attached to the dataset
bull In other dataset description models information about the data serializations is not considered inherent to the nature and content of the data collection so it is not attached to the dataset but rather to related entities called ldquodistributionsrdquo
Distribution metadata for applications
Applications have to find metadata about the actual ldquoserializationsrdquo or ldquodistributionsrdquo of the dataset to understand
1 Where to retrieve it URL (data dump servicehellip)
2 the necessary technical specifications to retrieve and parse a
distribution of the dataset
- format (file format data format)
- protocol API parametershellip
And if different for different distributions again3 the conditions for re-use (rights licenses)
4 the semantics of the dimensions (units of measure time granularity syntax reference taxonomies) if different semantics for different distributions
WHY protocols and API params
Example
Some datasets are available behind a service eg RDF datasets are often retrieved as subsets of an RDF store through SPARQL queries or national research institutes provide access to datasets behind a web service accepting parameters to filter the datasets
Use case I have an application that can fetch from several SOAP web services (protocol) at once automatically if it knows the parameters required by the SOAP service and the required syntax and type of the parameters
General issue with all metadata
Standardization of the values eg for ldquothematic coveragerdquo or ldquodimensionsrdquo of datasets ldquoformatrdquo or ldquoprotocol usedrdquo of distributions etc
- The value should be standardized possibly a URI
- The value should be part of an authority list code list
Andhellip There is no authority ldquovalue vocabularyrdquo or code list for many of these values
No out-of-the-box solution
bull Do existing data catalog tools normally cover these metadataNO
bull Do existing metadata vocabularies (RDF and not) cover all these metadata Or do they adopt the same modelNO
BUT by using even basic metadata to describe datasets in data catalogs ldquopublishers increase discoverability and enable applications to consume metadata from multiple catalogs This further enables decentralized publishing of catalogs and facilitates federated dataset search across sitesrdquo(from W3C page on ldquoBest Practices for Publishing Linked Datardquo)
Dataset description vocabularies
Letrsquos see if the main vocabularies to describe datasets provide themetadata we think are needed
for full interoperability
Semantic interoperability
In this presentation we cover only RDF vocabularies with special focus on semantic interoperability
Dataset metadata have been managed in several ways before semantic technologies see NetCDF or HDF5 structures and various hierarchical array-based structures used especially in observations datasets ndashincluding dataset metadata at the top and data arrays below
Dataset description vocabularies
bull DCAT vocabularybull RDF vocabulary for describing any dataset
bull Datasets can be standalone or part of a ldquocatalogrdquo
bull Metadata about dataset (collection) and related distributions
bull DataCube vocabularybull RDF vocabulary for describing statistical datasets
bull Useful for attaching metadata about the ldquodata structurerdquo of a dataset
bull VOID vocabularybull RDF vocabulary for expressing metadata about RDF datasets
bull Useful especially for metadata related to RDF data services
Definition of ldquodatasetrdquo in DCAT
Definition given by the W3C Government Linked Data Working Group
A dataset is ldquoa collection of data published or curated by a single source and available for access or download in one or more formatsrdquo
The ldquoinstancesrdquo of the dataset ldquoavailable for access or download in one or more formatsrdquo are called ldquodistributionsrdquo A dataset can have many distributions
Examples of distributions include a downloadable CSV file an API or an RSS feed
c
DCAT model
1) identifier(s)2) who is responsible for it3) when and where the data were
collected4) relations to organizations persons
publications software projects fundinghellip NO
5) the conditions for re-use (rights licenses)
6) provenance version NO7) coverage of the dataset8) dimensions and semantics NO9) slices subsets NO10) URL11) Format12) Protocols parameters NO
DCAT and DCAT-AP
The DCAT Application profile for data portals in
Europe (DCAT-AP) is an extension of DCAT
It combines DCAT with the W3C Asset Description Metadata
Schema (ADMS) vocabulary plus classes and properties from
Dublin Core SKOS and Vcard in an Application profile
The elaboration of the DCAT-AP was a joint initiative of DG
CONNECT the EU Publications Office and the ISA Programme
A diagram of the full DCAT-AP specification is on the next slide
Full DCAT AP
versions
rights and provenance
standards
rights
format
relation
1) who is responsible for it MORE2) relations to organizations projects publications funding Partly
3) the conditions for re-use (rights licenses) MORE4) provenance version YES5) protocols parameters NO6) dimensions and semantics NO
More than DCAT
Limitations of DCAT
It doesnrsquot cover
bull semantic relations to organisations persons software projects fundinghellip
bull dimensions and variables and syntax semantics of dimensions and variables
bull protocols and parameters for datasets available through APIs
bull time and space slices subsets
Combining DCAT with other vocabularies
ldquoOther complementary vocabularies may be used together with DCAT to provide more detailed format-specific information For example properties from the VoID vocabulary can be used if that dataset is in RDF formatrdquo
(from the DCAT specification)
DataCube structure definition
A cube is organized according to a set of dimensions attributes and measures
bull The dimension components serve to identify the observed dimensions (eg time geographic region gender elevationhellip)
bull The measure components represent the phenomenon being observed (eg life expectancy)
bull The attribute components enable specification of the units of measure any scaling factors and metadata such as the status of the observation (eg estimated provisional)
DataCube model for dataset structure
This part of the model could be re-used for describing the dimensions of any dataset also non-statistical
1) dimensions and semantics YES2) slices subsets YES
More than DCAT-AP
VOID model
c
dctlicensewvnorms wvwaiver
3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES
More than DCAT-AP
Complementing DCAT
bull For dimensions semantics of dimensions slicingbull DataCube
bull DDI
bull For API aspectsbull VOID (linked data)
bull Web services descriptions (Hydra (WSDL WADL))
bull For relations to organizations projects publications fundinghellipbull CERIF for datasets
bull VIVO Datastar
Many vocabulariesVocabularies with relations to DCAT or same model
bull DCAT-AP and other extensions
bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT
bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)
bull Schemaorg
Other vocabulariesbull DataCite and re3data
bull CERIF for datasets
bull VIVO Datastar
bull INSPIRE
Examples of application of DCAT
bull CKAN data catalog tool(more in your workshop)
bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances
bull CIARD RING federated data catalogmanaged by GFAR
CIARD RING
Datasets in the RING dataset hub
bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)
bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)
bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)
bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary
a ldquoRING DCAT profilerdquo will be published
Sample dataset record
RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt
ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt
ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt
ltrdftype rdfresource=httpschemaorgDatasetgt
ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt
ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt
ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt
ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt
ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltdctpublisher rdfresource=httpringciardnetnode19510gt
ltschemapublisher rdfresource=httpringciardnetnode19510gt
ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt
ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt
ltdctspatialgtNationalltns1spatialgt
ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt
ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt
ltdctconformsTo rdfresource=httpringciardnetnode19239gt
ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt
ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt
ltdcttype rdfresource=httpringciardnettaxonomy_term81gt
ltdcatcatalog rdfresource=httpringciardnetnode19436gt
ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt
ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt
ltrdfDescriptiongt
References1 Issues and Recommendations Associated with Distributed Computation and Data
Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ
2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)
3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)
4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf
5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page
6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki
Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat
bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-
profile-data-portals-europe-final
bull DataCube httppurlorglinked-datacube
bull VOID httprdfsorgnsvoid-guide
bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml
bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology
bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables
bull CKAN httpckanorg
bull Dataverse httpdataverseorg
bull Datahub httpdatahubio
bull DataCite httpsearchdataciteorguiq=subject3Aagriculture
bull Re3data httpwwwre3dataorg
bull OpenAIRE httpswwwopenaireeu
bull CIARD RING httpringciardinfo
Dataset description and DCAT
DRTCISI-ICSUCODATAInternational Workshop on
Open Data Repositories
Thank you
Valeria Pesce
valeriapescefaoorg
Single dataset
bull As long as you consider a dataset alone you may not need structured metadata about the dataset for further re-use as long as the dataset uses sector standards and is documented in some way
bull When managing a single dataset or a few homogeneous datasets between colleagues it may be enough to use sector-specific standards (eg Multi-crop descriptors and Darwin Core for germplasm INSPIRE for soilgeo ldquoMinimum Setrdquo recommendations for observations sector code lists vocabularies) or application-specific standards
Datasets in repositories
But what happens when you have a big data repository with heterogeneous datasets
Or you have few datasets but you want to contribute them to a huge data repositorycatalog where many other datasets are
Users will want to find your dataset among many others possibly together with datasets with similar data using the same standards measures syntax
They will use tools to search for datasets
Then dataset metadata (or a machine-readable dataset description)
becomes important
Why machine-readable descriptions
bull Data will be re-used by applicationsOthers will search and make use of your data through tools
Datasets have to be found by applications Datasets have to be understood by applications
bull Datasets should be managed in data repositories data catalogs Data catalogs have to provide enough dataset metadata to applications to allow them to find and understand datasets
bull Data catalogs themselves are implemented as applications so they need machine-readable dataset metadata
Ref ID URI Dimension2 Dimension3 Dimension4
Value11 Value12 Value13 Value14
Value21 Value22 Value23 Value24
Value31 Value32 Value33 Value34
Value41 Value42 Value43 Value44
Dataset
Datum
Record ldquomemberrdquo observation
File system
Data repository
DatasetRef Matadata1 Metadata2 Metadata3
Ref1 Address1 Value12 Value13
Ref2 Address2 Value22 Value23
Data catalog (also a dataset)
Tabular only for the sake of simplification it could be triples or other data structures
CatalogID Value1 Value2 Value3
Search ExportData type
We only focus on the data catalog level
Dataset metadata
So what metadata do applications need to find in data catalogs
Dataset metadata for applications
1) General metadata about the dataset1) identifier(s)2) who is responsible for it3) when and where the data were collected4) relations to organizations persons publications software
projects fundinghellip5) the conditions for re-use (rights licenses)6) provenance versions7) the specific coverage of the dataset (type of data thematic
coverage geographic coverage)
8) time and space slices subsets 9) the ldquodimensionsrdquo and ldquovariablesrdquo covered by the dataset10) the semantics of the dimensions (units of measure time
granularity syntax reference taxonomies)
WHY dimensions and semantics
Example of search by researcher
Irsquom doing research on plant phenotypes give me all datasets of crop phenotypic data that include the dimensions of time geographic location and height plus units of measure used for time and height where geographic location is expressed as coordinates (because my software only processes coordinates)
Data seralizations
bull The metadata above refer to the collection as a whole but additional technical metadata refer to the different ldquoserializationsrdquo of the datahellip
bull In many dataset description models the metadata about the data serialization is attached to the dataset
bull In other dataset description models information about the data serializations is not considered inherent to the nature and content of the data collection so it is not attached to the dataset but rather to related entities called ldquodistributionsrdquo
Distribution metadata for applications
Applications have to find metadata about the actual ldquoserializationsrdquo or ldquodistributionsrdquo of the dataset to understand
1 Where to retrieve it URL (data dump servicehellip)
2 the necessary technical specifications to retrieve and parse a
distribution of the dataset
- format (file format data format)
- protocol API parametershellip
And if different for different distributions again3 the conditions for re-use (rights licenses)
4 the semantics of the dimensions (units of measure time granularity syntax reference taxonomies) if different semantics for different distributions
WHY protocols and API params
Example
Some datasets are available behind a service eg RDF datasets are often retrieved as subsets of an RDF store through SPARQL queries or national research institutes provide access to datasets behind a web service accepting parameters to filter the datasets
Use case I have an application that can fetch from several SOAP web services (protocol) at once automatically if it knows the parameters required by the SOAP service and the required syntax and type of the parameters
General issue with all metadata
Standardization of the values eg for ldquothematic coveragerdquo or ldquodimensionsrdquo of datasets ldquoformatrdquo or ldquoprotocol usedrdquo of distributions etc
- The value should be standardized possibly a URI
- The value should be part of an authority list code list
Andhellip There is no authority ldquovalue vocabularyrdquo or code list for many of these values
No out-of-the-box solution
bull Do existing data catalog tools normally cover these metadataNO
bull Do existing metadata vocabularies (RDF and not) cover all these metadata Or do they adopt the same modelNO
BUT by using even basic metadata to describe datasets in data catalogs ldquopublishers increase discoverability and enable applications to consume metadata from multiple catalogs This further enables decentralized publishing of catalogs and facilitates federated dataset search across sitesrdquo(from W3C page on ldquoBest Practices for Publishing Linked Datardquo)
Dataset description vocabularies
Letrsquos see if the main vocabularies to describe datasets provide themetadata we think are needed
for full interoperability
Semantic interoperability
In this presentation we cover only RDF vocabularies with special focus on semantic interoperability
Dataset metadata have been managed in several ways before semantic technologies see NetCDF or HDF5 structures and various hierarchical array-based structures used especially in observations datasets ndashincluding dataset metadata at the top and data arrays below
Dataset description vocabularies
bull DCAT vocabularybull RDF vocabulary for describing any dataset
bull Datasets can be standalone or part of a ldquocatalogrdquo
bull Metadata about dataset (collection) and related distributions
bull DataCube vocabularybull RDF vocabulary for describing statistical datasets
bull Useful for attaching metadata about the ldquodata structurerdquo of a dataset
bull VOID vocabularybull RDF vocabulary for expressing metadata about RDF datasets
bull Useful especially for metadata related to RDF data services
Definition of ldquodatasetrdquo in DCAT
Definition given by the W3C Government Linked Data Working Group
A dataset is ldquoa collection of data published or curated by a single source and available for access or download in one or more formatsrdquo
The ldquoinstancesrdquo of the dataset ldquoavailable for access or download in one or more formatsrdquo are called ldquodistributionsrdquo A dataset can have many distributions
Examples of distributions include a downloadable CSV file an API or an RSS feed
c
DCAT model
1) identifier(s)2) who is responsible for it3) when and where the data were
collected4) relations to organizations persons
publications software projects fundinghellip NO
5) the conditions for re-use (rights licenses)
6) provenance version NO7) coverage of the dataset8) dimensions and semantics NO9) slices subsets NO10) URL11) Format12) Protocols parameters NO
DCAT and DCAT-AP
The DCAT Application profile for data portals in
Europe (DCAT-AP) is an extension of DCAT
It combines DCAT with the W3C Asset Description Metadata
Schema (ADMS) vocabulary plus classes and properties from
Dublin Core SKOS and Vcard in an Application profile
The elaboration of the DCAT-AP was a joint initiative of DG
CONNECT the EU Publications Office and the ISA Programme
A diagram of the full DCAT-AP specification is on the next slide
Full DCAT AP
versions
rights and provenance
standards
rights
format
relation
1) who is responsible for it MORE2) relations to organizations projects publications funding Partly
3) the conditions for re-use (rights licenses) MORE4) provenance version YES5) protocols parameters NO6) dimensions and semantics NO
More than DCAT
Limitations of DCAT
It doesnrsquot cover
bull semantic relations to organisations persons software projects fundinghellip
bull dimensions and variables and syntax semantics of dimensions and variables
bull protocols and parameters for datasets available through APIs
bull time and space slices subsets
Combining DCAT with other vocabularies
ldquoOther complementary vocabularies may be used together with DCAT to provide more detailed format-specific information For example properties from the VoID vocabulary can be used if that dataset is in RDF formatrdquo
(from the DCAT specification)
DataCube structure definition
A cube is organized according to a set of dimensions attributes and measures
bull The dimension components serve to identify the observed dimensions (eg time geographic region gender elevationhellip)
bull The measure components represent the phenomenon being observed (eg life expectancy)
bull The attribute components enable specification of the units of measure any scaling factors and metadata such as the status of the observation (eg estimated provisional)
DataCube model for dataset structure
This part of the model could be re-used for describing the dimensions of any dataset also non-statistical
1) dimensions and semantics YES2) slices subsets YES
More than DCAT-AP
VOID model
c
dctlicensewvnorms wvwaiver
3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES
More than DCAT-AP
Complementing DCAT
bull For dimensions semantics of dimensions slicingbull DataCube
bull DDI
bull For API aspectsbull VOID (linked data)
bull Web services descriptions (Hydra (WSDL WADL))
bull For relations to organizations projects publications fundinghellipbull CERIF for datasets
bull VIVO Datastar
Many vocabulariesVocabularies with relations to DCAT or same model
bull DCAT-AP and other extensions
bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT
bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)
bull Schemaorg
Other vocabulariesbull DataCite and re3data
bull CERIF for datasets
bull VIVO Datastar
bull INSPIRE
Examples of application of DCAT
bull CKAN data catalog tool(more in your workshop)
bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances
bull CIARD RING federated data catalogmanaged by GFAR
CIARD RING
Datasets in the RING dataset hub
bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)
bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)
bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)
bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary
a ldquoRING DCAT profilerdquo will be published
Sample dataset record
RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt
ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt
ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt
ltrdftype rdfresource=httpschemaorgDatasetgt
ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt
ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt
ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt
ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt
ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltdctpublisher rdfresource=httpringciardnetnode19510gt
ltschemapublisher rdfresource=httpringciardnetnode19510gt
ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt
ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt
ltdctspatialgtNationalltns1spatialgt
ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt
ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt
ltdctconformsTo rdfresource=httpringciardnetnode19239gt
ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt
ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt
ltdcttype rdfresource=httpringciardnettaxonomy_term81gt
ltdcatcatalog rdfresource=httpringciardnetnode19436gt
ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt
ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt
ltrdfDescriptiongt
References1 Issues and Recommendations Associated with Distributed Computation and Data
Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ
2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)
3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)
4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf
5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page
6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki
Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat
bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-
profile-data-portals-europe-final
bull DataCube httppurlorglinked-datacube
bull VOID httprdfsorgnsvoid-guide
bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml
bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology
bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables
bull CKAN httpckanorg
bull Dataverse httpdataverseorg
bull Datahub httpdatahubio
bull DataCite httpsearchdataciteorguiq=subject3Aagriculture
bull Re3data httpwwwre3dataorg
bull OpenAIRE httpswwwopenaireeu
bull CIARD RING httpringciardinfo
Dataset description and DCAT
DRTCISI-ICSUCODATAInternational Workshop on
Open Data Repositories
Thank you
Valeria Pesce
valeriapescefaoorg
Datasets in repositories
But what happens when you have a big data repository with heterogeneous datasets
Or you have few datasets but you want to contribute them to a huge data repositorycatalog where many other datasets are
Users will want to find your dataset among many others possibly together with datasets with similar data using the same standards measures syntax
They will use tools to search for datasets
Then dataset metadata (or a machine-readable dataset description)
becomes important
Why machine-readable descriptions
bull Data will be re-used by applicationsOthers will search and make use of your data through tools
Datasets have to be found by applications Datasets have to be understood by applications
bull Datasets should be managed in data repositories data catalogs Data catalogs have to provide enough dataset metadata to applications to allow them to find and understand datasets
bull Data catalogs themselves are implemented as applications so they need machine-readable dataset metadata
Ref ID URI Dimension2 Dimension3 Dimension4
Value11 Value12 Value13 Value14
Value21 Value22 Value23 Value24
Value31 Value32 Value33 Value34
Value41 Value42 Value43 Value44
Dataset
Datum
Record ldquomemberrdquo observation
File system
Data repository
DatasetRef Matadata1 Metadata2 Metadata3
Ref1 Address1 Value12 Value13
Ref2 Address2 Value22 Value23
Data catalog (also a dataset)
Tabular only for the sake of simplification it could be triples or other data structures
CatalogID Value1 Value2 Value3
Search ExportData type
We only focus on the data catalog level
Dataset metadata
So what metadata do applications need to find in data catalogs
Dataset metadata for applications
1) General metadata about the dataset1) identifier(s)2) who is responsible for it3) when and where the data were collected4) relations to organizations persons publications software
projects fundinghellip5) the conditions for re-use (rights licenses)6) provenance versions7) the specific coverage of the dataset (type of data thematic
coverage geographic coverage)
8) time and space slices subsets 9) the ldquodimensionsrdquo and ldquovariablesrdquo covered by the dataset10) the semantics of the dimensions (units of measure time
granularity syntax reference taxonomies)
WHY dimensions and semantics
Example of search by researcher
Irsquom doing research on plant phenotypes give me all datasets of crop phenotypic data that include the dimensions of time geographic location and height plus units of measure used for time and height where geographic location is expressed as coordinates (because my software only processes coordinates)
Data seralizations
bull The metadata above refer to the collection as a whole but additional technical metadata refer to the different ldquoserializationsrdquo of the datahellip
bull In many dataset description models the metadata about the data serialization is attached to the dataset
bull In other dataset description models information about the data serializations is not considered inherent to the nature and content of the data collection so it is not attached to the dataset but rather to related entities called ldquodistributionsrdquo
Distribution metadata for applications
Applications have to find metadata about the actual ldquoserializationsrdquo or ldquodistributionsrdquo of the dataset to understand
1 Where to retrieve it URL (data dump servicehellip)
2 the necessary technical specifications to retrieve and parse a
distribution of the dataset
- format (file format data format)
- protocol API parametershellip
And if different for different distributions again3 the conditions for re-use (rights licenses)
4 the semantics of the dimensions (units of measure time granularity syntax reference taxonomies) if different semantics for different distributions
WHY protocols and API params
Example
Some datasets are available behind a service eg RDF datasets are often retrieved as subsets of an RDF store through SPARQL queries or national research institutes provide access to datasets behind a web service accepting parameters to filter the datasets
Use case I have an application that can fetch from several SOAP web services (protocol) at once automatically if it knows the parameters required by the SOAP service and the required syntax and type of the parameters
General issue with all metadata
Standardization of the values eg for ldquothematic coveragerdquo or ldquodimensionsrdquo of datasets ldquoformatrdquo or ldquoprotocol usedrdquo of distributions etc
- The value should be standardized possibly a URI
- The value should be part of an authority list code list
Andhellip There is no authority ldquovalue vocabularyrdquo or code list for many of these values
No out-of-the-box solution
bull Do existing data catalog tools normally cover these metadataNO
bull Do existing metadata vocabularies (RDF and not) cover all these metadata Or do they adopt the same modelNO
BUT by using even basic metadata to describe datasets in data catalogs ldquopublishers increase discoverability and enable applications to consume metadata from multiple catalogs This further enables decentralized publishing of catalogs and facilitates federated dataset search across sitesrdquo(from W3C page on ldquoBest Practices for Publishing Linked Datardquo)
Dataset description vocabularies
Letrsquos see if the main vocabularies to describe datasets provide themetadata we think are needed
for full interoperability
Semantic interoperability
In this presentation we cover only RDF vocabularies with special focus on semantic interoperability
Dataset metadata have been managed in several ways before semantic technologies see NetCDF or HDF5 structures and various hierarchical array-based structures used especially in observations datasets ndashincluding dataset metadata at the top and data arrays below
Dataset description vocabularies
bull DCAT vocabularybull RDF vocabulary for describing any dataset
bull Datasets can be standalone or part of a ldquocatalogrdquo
bull Metadata about dataset (collection) and related distributions
bull DataCube vocabularybull RDF vocabulary for describing statistical datasets
bull Useful for attaching metadata about the ldquodata structurerdquo of a dataset
bull VOID vocabularybull RDF vocabulary for expressing metadata about RDF datasets
bull Useful especially for metadata related to RDF data services
Definition of ldquodatasetrdquo in DCAT
Definition given by the W3C Government Linked Data Working Group
A dataset is ldquoa collection of data published or curated by a single source and available for access or download in one or more formatsrdquo
The ldquoinstancesrdquo of the dataset ldquoavailable for access or download in one or more formatsrdquo are called ldquodistributionsrdquo A dataset can have many distributions
Examples of distributions include a downloadable CSV file an API or an RSS feed
c
DCAT model
1) identifier(s)2) who is responsible for it3) when and where the data were
collected4) relations to organizations persons
publications software projects fundinghellip NO
5) the conditions for re-use (rights licenses)
6) provenance version NO7) coverage of the dataset8) dimensions and semantics NO9) slices subsets NO10) URL11) Format12) Protocols parameters NO
DCAT and DCAT-AP
The DCAT Application profile for data portals in
Europe (DCAT-AP) is an extension of DCAT
It combines DCAT with the W3C Asset Description Metadata
Schema (ADMS) vocabulary plus classes and properties from
Dublin Core SKOS and Vcard in an Application profile
The elaboration of the DCAT-AP was a joint initiative of DG
CONNECT the EU Publications Office and the ISA Programme
A diagram of the full DCAT-AP specification is on the next slide
Full DCAT AP
versions
rights and provenance
standards
rights
format
relation
1) who is responsible for it MORE2) relations to organizations projects publications funding Partly
3) the conditions for re-use (rights licenses) MORE4) provenance version YES5) protocols parameters NO6) dimensions and semantics NO
More than DCAT
Limitations of DCAT
It doesnrsquot cover
bull semantic relations to organisations persons software projects fundinghellip
bull dimensions and variables and syntax semantics of dimensions and variables
bull protocols and parameters for datasets available through APIs
bull time and space slices subsets
Combining DCAT with other vocabularies
ldquoOther complementary vocabularies may be used together with DCAT to provide more detailed format-specific information For example properties from the VoID vocabulary can be used if that dataset is in RDF formatrdquo
(from the DCAT specification)
DataCube structure definition
A cube is organized according to a set of dimensions attributes and measures
bull The dimension components serve to identify the observed dimensions (eg time geographic region gender elevationhellip)
bull The measure components represent the phenomenon being observed (eg life expectancy)
bull The attribute components enable specification of the units of measure any scaling factors and metadata such as the status of the observation (eg estimated provisional)
DataCube model for dataset structure
This part of the model could be re-used for describing the dimensions of any dataset also non-statistical
1) dimensions and semantics YES2) slices subsets YES
More than DCAT-AP
VOID model
c
dctlicensewvnorms wvwaiver
3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES
More than DCAT-AP
Complementing DCAT
bull For dimensions semantics of dimensions slicingbull DataCube
bull DDI
bull For API aspectsbull VOID (linked data)
bull Web services descriptions (Hydra (WSDL WADL))
bull For relations to organizations projects publications fundinghellipbull CERIF for datasets
bull VIVO Datastar
Many vocabulariesVocabularies with relations to DCAT or same model
bull DCAT-AP and other extensions
bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT
bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)
bull Schemaorg
Other vocabulariesbull DataCite and re3data
bull CERIF for datasets
bull VIVO Datastar
bull INSPIRE
Examples of application of DCAT
bull CKAN data catalog tool(more in your workshop)
bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances
bull CIARD RING federated data catalogmanaged by GFAR
CIARD RING
Datasets in the RING dataset hub
bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)
bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)
bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)
bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary
a ldquoRING DCAT profilerdquo will be published
Sample dataset record
RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt
ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt
ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt
ltrdftype rdfresource=httpschemaorgDatasetgt
ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt
ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt
ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt
ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt
ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltdctpublisher rdfresource=httpringciardnetnode19510gt
ltschemapublisher rdfresource=httpringciardnetnode19510gt
ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt
ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt
ltdctspatialgtNationalltns1spatialgt
ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt
ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt
ltdctconformsTo rdfresource=httpringciardnetnode19239gt
ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt
ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt
ltdcttype rdfresource=httpringciardnettaxonomy_term81gt
ltdcatcatalog rdfresource=httpringciardnetnode19436gt
ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt
ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt
ltrdfDescriptiongt
References1 Issues and Recommendations Associated with Distributed Computation and Data
Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ
2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)
3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)
4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf
5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page
6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki
Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat
bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-
profile-data-portals-europe-final
bull DataCube httppurlorglinked-datacube
bull VOID httprdfsorgnsvoid-guide
bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml
bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology
bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables
bull CKAN httpckanorg
bull Dataverse httpdataverseorg
bull Datahub httpdatahubio
bull DataCite httpsearchdataciteorguiq=subject3Aagriculture
bull Re3data httpwwwre3dataorg
bull OpenAIRE httpswwwopenaireeu
bull CIARD RING httpringciardinfo
Dataset description and DCAT
DRTCISI-ICSUCODATAInternational Workshop on
Open Data Repositories
Thank you
Valeria Pesce
valeriapescefaoorg
Why machine-readable descriptions
bull Data will be re-used by applicationsOthers will search and make use of your data through tools
Datasets have to be found by applications Datasets have to be understood by applications
bull Datasets should be managed in data repositories data catalogs Data catalogs have to provide enough dataset metadata to applications to allow them to find and understand datasets
bull Data catalogs themselves are implemented as applications so they need machine-readable dataset metadata
Ref ID URI Dimension2 Dimension3 Dimension4
Value11 Value12 Value13 Value14
Value21 Value22 Value23 Value24
Value31 Value32 Value33 Value34
Value41 Value42 Value43 Value44
Dataset
Datum
Record ldquomemberrdquo observation
File system
Data repository
DatasetRef Matadata1 Metadata2 Metadata3
Ref1 Address1 Value12 Value13
Ref2 Address2 Value22 Value23
Data catalog (also a dataset)
Tabular only for the sake of simplification it could be triples or other data structures
CatalogID Value1 Value2 Value3
Search ExportData type
We only focus on the data catalog level
Dataset metadata
So what metadata do applications need to find in data catalogs
Dataset metadata for applications
1) General metadata about the dataset1) identifier(s)2) who is responsible for it3) when and where the data were collected4) relations to organizations persons publications software
projects fundinghellip5) the conditions for re-use (rights licenses)6) provenance versions7) the specific coverage of the dataset (type of data thematic
coverage geographic coverage)
8) time and space slices subsets 9) the ldquodimensionsrdquo and ldquovariablesrdquo covered by the dataset10) the semantics of the dimensions (units of measure time
granularity syntax reference taxonomies)
WHY dimensions and semantics
Example of search by researcher
Irsquom doing research on plant phenotypes give me all datasets of crop phenotypic data that include the dimensions of time geographic location and height plus units of measure used for time and height where geographic location is expressed as coordinates (because my software only processes coordinates)
Data seralizations
bull The metadata above refer to the collection as a whole but additional technical metadata refer to the different ldquoserializationsrdquo of the datahellip
bull In many dataset description models the metadata about the data serialization is attached to the dataset
bull In other dataset description models information about the data serializations is not considered inherent to the nature and content of the data collection so it is not attached to the dataset but rather to related entities called ldquodistributionsrdquo
Distribution metadata for applications
Applications have to find metadata about the actual ldquoserializationsrdquo or ldquodistributionsrdquo of the dataset to understand
1 Where to retrieve it URL (data dump servicehellip)
2 the necessary technical specifications to retrieve and parse a
distribution of the dataset
- format (file format data format)
- protocol API parametershellip
And if different for different distributions again3 the conditions for re-use (rights licenses)
4 the semantics of the dimensions (units of measure time granularity syntax reference taxonomies) if different semantics for different distributions
WHY protocols and API params
Example
Some datasets are available behind a service eg RDF datasets are often retrieved as subsets of an RDF store through SPARQL queries or national research institutes provide access to datasets behind a web service accepting parameters to filter the datasets
Use case I have an application that can fetch from several SOAP web services (protocol) at once automatically if it knows the parameters required by the SOAP service and the required syntax and type of the parameters
General issue with all metadata
Standardization of the values eg for ldquothematic coveragerdquo or ldquodimensionsrdquo of datasets ldquoformatrdquo or ldquoprotocol usedrdquo of distributions etc
- The value should be standardized possibly a URI
- The value should be part of an authority list code list
Andhellip There is no authority ldquovalue vocabularyrdquo or code list for many of these values
No out-of-the-box solution
bull Do existing data catalog tools normally cover these metadataNO
bull Do existing metadata vocabularies (RDF and not) cover all these metadata Or do they adopt the same modelNO
BUT by using even basic metadata to describe datasets in data catalogs ldquopublishers increase discoverability and enable applications to consume metadata from multiple catalogs This further enables decentralized publishing of catalogs and facilitates federated dataset search across sitesrdquo(from W3C page on ldquoBest Practices for Publishing Linked Datardquo)
Dataset description vocabularies
Letrsquos see if the main vocabularies to describe datasets provide themetadata we think are needed
for full interoperability
Semantic interoperability
In this presentation we cover only RDF vocabularies with special focus on semantic interoperability
Dataset metadata have been managed in several ways before semantic technologies see NetCDF or HDF5 structures and various hierarchical array-based structures used especially in observations datasets ndashincluding dataset metadata at the top and data arrays below
Dataset description vocabularies
bull DCAT vocabularybull RDF vocabulary for describing any dataset
bull Datasets can be standalone or part of a ldquocatalogrdquo
bull Metadata about dataset (collection) and related distributions
bull DataCube vocabularybull RDF vocabulary for describing statistical datasets
bull Useful for attaching metadata about the ldquodata structurerdquo of a dataset
bull VOID vocabularybull RDF vocabulary for expressing metadata about RDF datasets
bull Useful especially for metadata related to RDF data services
Definition of ldquodatasetrdquo in DCAT
Definition given by the W3C Government Linked Data Working Group
A dataset is ldquoa collection of data published or curated by a single source and available for access or download in one or more formatsrdquo
The ldquoinstancesrdquo of the dataset ldquoavailable for access or download in one or more formatsrdquo are called ldquodistributionsrdquo A dataset can have many distributions
Examples of distributions include a downloadable CSV file an API or an RSS feed
c
DCAT model
1) identifier(s)2) who is responsible for it3) when and where the data were
collected4) relations to organizations persons
publications software projects fundinghellip NO
5) the conditions for re-use (rights licenses)
6) provenance version NO7) coverage of the dataset8) dimensions and semantics NO9) slices subsets NO10) URL11) Format12) Protocols parameters NO
DCAT and DCAT-AP
The DCAT Application profile for data portals in
Europe (DCAT-AP) is an extension of DCAT
It combines DCAT with the W3C Asset Description Metadata
Schema (ADMS) vocabulary plus classes and properties from
Dublin Core SKOS and Vcard in an Application profile
The elaboration of the DCAT-AP was a joint initiative of DG
CONNECT the EU Publications Office and the ISA Programme
A diagram of the full DCAT-AP specification is on the next slide
Full DCAT AP
versions
rights and provenance
standards
rights
format
relation
1) who is responsible for it MORE2) relations to organizations projects publications funding Partly
3) the conditions for re-use (rights licenses) MORE4) provenance version YES5) protocols parameters NO6) dimensions and semantics NO
More than DCAT
Limitations of DCAT
It doesnrsquot cover
bull semantic relations to organisations persons software projects fundinghellip
bull dimensions and variables and syntax semantics of dimensions and variables
bull protocols and parameters for datasets available through APIs
bull time and space slices subsets
Combining DCAT with other vocabularies
ldquoOther complementary vocabularies may be used together with DCAT to provide more detailed format-specific information For example properties from the VoID vocabulary can be used if that dataset is in RDF formatrdquo
(from the DCAT specification)
DataCube structure definition
A cube is organized according to a set of dimensions attributes and measures
bull The dimension components serve to identify the observed dimensions (eg time geographic region gender elevationhellip)
bull The measure components represent the phenomenon being observed (eg life expectancy)
bull The attribute components enable specification of the units of measure any scaling factors and metadata such as the status of the observation (eg estimated provisional)
DataCube model for dataset structure
This part of the model could be re-used for describing the dimensions of any dataset also non-statistical
1) dimensions and semantics YES2) slices subsets YES
More than DCAT-AP
VOID model
c
dctlicensewvnorms wvwaiver
3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES
More than DCAT-AP
Complementing DCAT
bull For dimensions semantics of dimensions slicingbull DataCube
bull DDI
bull For API aspectsbull VOID (linked data)
bull Web services descriptions (Hydra (WSDL WADL))
bull For relations to organizations projects publications fundinghellipbull CERIF for datasets
bull VIVO Datastar
Many vocabulariesVocabularies with relations to DCAT or same model
bull DCAT-AP and other extensions
bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT
bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)
bull Schemaorg
Other vocabulariesbull DataCite and re3data
bull CERIF for datasets
bull VIVO Datastar
bull INSPIRE
Examples of application of DCAT
bull CKAN data catalog tool(more in your workshop)
bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances
bull CIARD RING federated data catalogmanaged by GFAR
CIARD RING
Datasets in the RING dataset hub
bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)
bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)
bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)
bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary
a ldquoRING DCAT profilerdquo will be published
Sample dataset record
RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt
ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt
ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt
ltrdftype rdfresource=httpschemaorgDatasetgt
ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt
ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt
ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt
ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt
ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltdctpublisher rdfresource=httpringciardnetnode19510gt
ltschemapublisher rdfresource=httpringciardnetnode19510gt
ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt
ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt
ltdctspatialgtNationalltns1spatialgt
ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt
ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt
ltdctconformsTo rdfresource=httpringciardnetnode19239gt
ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt
ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt
ltdcttype rdfresource=httpringciardnettaxonomy_term81gt
ltdcatcatalog rdfresource=httpringciardnetnode19436gt
ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt
ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt
ltrdfDescriptiongt
References1 Issues and Recommendations Associated with Distributed Computation and Data
Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ
2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)
3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)
4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf
5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page
6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki
Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat
bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-
profile-data-portals-europe-final
bull DataCube httppurlorglinked-datacube
bull VOID httprdfsorgnsvoid-guide
bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml
bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology
bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables
bull CKAN httpckanorg
bull Dataverse httpdataverseorg
bull Datahub httpdatahubio
bull DataCite httpsearchdataciteorguiq=subject3Aagriculture
bull Re3data httpwwwre3dataorg
bull OpenAIRE httpswwwopenaireeu
bull CIARD RING httpringciardinfo
Dataset description and DCAT
DRTCISI-ICSUCODATAInternational Workshop on
Open Data Repositories
Thank you
Valeria Pesce
valeriapescefaoorg
Ref ID URI Dimension2 Dimension3 Dimension4
Value11 Value12 Value13 Value14
Value21 Value22 Value23 Value24
Value31 Value32 Value33 Value34
Value41 Value42 Value43 Value44
Dataset
Datum
Record ldquomemberrdquo observation
File system
Data repository
DatasetRef Matadata1 Metadata2 Metadata3
Ref1 Address1 Value12 Value13
Ref2 Address2 Value22 Value23
Data catalog (also a dataset)
Tabular only for the sake of simplification it could be triples or other data structures
CatalogID Value1 Value2 Value3
Search ExportData type
We only focus on the data catalog level
Dataset metadata
So what metadata do applications need to find in data catalogs
Dataset metadata for applications
1) General metadata about the dataset1) identifier(s)2) who is responsible for it3) when and where the data were collected4) relations to organizations persons publications software
projects fundinghellip5) the conditions for re-use (rights licenses)6) provenance versions7) the specific coverage of the dataset (type of data thematic
coverage geographic coverage)
8) time and space slices subsets 9) the ldquodimensionsrdquo and ldquovariablesrdquo covered by the dataset10) the semantics of the dimensions (units of measure time
granularity syntax reference taxonomies)
WHY dimensions and semantics
Example of search by researcher
Irsquom doing research on plant phenotypes give me all datasets of crop phenotypic data that include the dimensions of time geographic location and height plus units of measure used for time and height where geographic location is expressed as coordinates (because my software only processes coordinates)
Data seralizations
bull The metadata above refer to the collection as a whole but additional technical metadata refer to the different ldquoserializationsrdquo of the datahellip
bull In many dataset description models the metadata about the data serialization is attached to the dataset
bull In other dataset description models information about the data serializations is not considered inherent to the nature and content of the data collection so it is not attached to the dataset but rather to related entities called ldquodistributionsrdquo
Distribution metadata for applications
Applications have to find metadata about the actual ldquoserializationsrdquo or ldquodistributionsrdquo of the dataset to understand
1 Where to retrieve it URL (data dump servicehellip)
2 the necessary technical specifications to retrieve and parse a
distribution of the dataset
- format (file format data format)
- protocol API parametershellip
And if different for different distributions again3 the conditions for re-use (rights licenses)
4 the semantics of the dimensions (units of measure time granularity syntax reference taxonomies) if different semantics for different distributions
WHY protocols and API params
Example
Some datasets are available behind a service eg RDF datasets are often retrieved as subsets of an RDF store through SPARQL queries or national research institutes provide access to datasets behind a web service accepting parameters to filter the datasets
Use case I have an application that can fetch from several SOAP web services (protocol) at once automatically if it knows the parameters required by the SOAP service and the required syntax and type of the parameters
General issue with all metadata
Standardization of the values eg for ldquothematic coveragerdquo or ldquodimensionsrdquo of datasets ldquoformatrdquo or ldquoprotocol usedrdquo of distributions etc
- The value should be standardized possibly a URI
- The value should be part of an authority list code list
Andhellip There is no authority ldquovalue vocabularyrdquo or code list for many of these values
No out-of-the-box solution
bull Do existing data catalog tools normally cover these metadataNO
bull Do existing metadata vocabularies (RDF and not) cover all these metadata Or do they adopt the same modelNO
BUT by using even basic metadata to describe datasets in data catalogs ldquopublishers increase discoverability and enable applications to consume metadata from multiple catalogs This further enables decentralized publishing of catalogs and facilitates federated dataset search across sitesrdquo(from W3C page on ldquoBest Practices for Publishing Linked Datardquo)
Dataset description vocabularies
Letrsquos see if the main vocabularies to describe datasets provide themetadata we think are needed
for full interoperability
Semantic interoperability
In this presentation we cover only RDF vocabularies with special focus on semantic interoperability
Dataset metadata have been managed in several ways before semantic technologies see NetCDF or HDF5 structures and various hierarchical array-based structures used especially in observations datasets ndashincluding dataset metadata at the top and data arrays below
Dataset description vocabularies
bull DCAT vocabularybull RDF vocabulary for describing any dataset
bull Datasets can be standalone or part of a ldquocatalogrdquo
bull Metadata about dataset (collection) and related distributions
bull DataCube vocabularybull RDF vocabulary for describing statistical datasets
bull Useful for attaching metadata about the ldquodata structurerdquo of a dataset
bull VOID vocabularybull RDF vocabulary for expressing metadata about RDF datasets
bull Useful especially for metadata related to RDF data services
Definition of ldquodatasetrdquo in DCAT
Definition given by the W3C Government Linked Data Working Group
A dataset is ldquoa collection of data published or curated by a single source and available for access or download in one or more formatsrdquo
The ldquoinstancesrdquo of the dataset ldquoavailable for access or download in one or more formatsrdquo are called ldquodistributionsrdquo A dataset can have many distributions
Examples of distributions include a downloadable CSV file an API or an RSS feed
c
DCAT model
1) identifier(s)2) who is responsible for it3) when and where the data were
collected4) relations to organizations persons
publications software projects fundinghellip NO
5) the conditions for re-use (rights licenses)
6) provenance version NO7) coverage of the dataset8) dimensions and semantics NO9) slices subsets NO10) URL11) Format12) Protocols parameters NO
DCAT and DCAT-AP
The DCAT Application profile for data portals in
Europe (DCAT-AP) is an extension of DCAT
It combines DCAT with the W3C Asset Description Metadata
Schema (ADMS) vocabulary plus classes and properties from
Dublin Core SKOS and Vcard in an Application profile
The elaboration of the DCAT-AP was a joint initiative of DG
CONNECT the EU Publications Office and the ISA Programme
A diagram of the full DCAT-AP specification is on the next slide
Full DCAT AP
versions
rights and provenance
standards
rights
format
relation
1) who is responsible for it MORE2) relations to organizations projects publications funding Partly
3) the conditions for re-use (rights licenses) MORE4) provenance version YES5) protocols parameters NO6) dimensions and semantics NO
More than DCAT
Limitations of DCAT
It doesnrsquot cover
bull semantic relations to organisations persons software projects fundinghellip
bull dimensions and variables and syntax semantics of dimensions and variables
bull protocols and parameters for datasets available through APIs
bull time and space slices subsets
Combining DCAT with other vocabularies
ldquoOther complementary vocabularies may be used together with DCAT to provide more detailed format-specific information For example properties from the VoID vocabulary can be used if that dataset is in RDF formatrdquo
(from the DCAT specification)
DataCube structure definition
A cube is organized according to a set of dimensions attributes and measures
bull The dimension components serve to identify the observed dimensions (eg time geographic region gender elevationhellip)
bull The measure components represent the phenomenon being observed (eg life expectancy)
bull The attribute components enable specification of the units of measure any scaling factors and metadata such as the status of the observation (eg estimated provisional)
DataCube model for dataset structure
This part of the model could be re-used for describing the dimensions of any dataset also non-statistical
1) dimensions and semantics YES2) slices subsets YES
More than DCAT-AP
VOID model
c
dctlicensewvnorms wvwaiver
3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES
More than DCAT-AP
Complementing DCAT
bull For dimensions semantics of dimensions slicingbull DataCube
bull DDI
bull For API aspectsbull VOID (linked data)
bull Web services descriptions (Hydra (WSDL WADL))
bull For relations to organizations projects publications fundinghellipbull CERIF for datasets
bull VIVO Datastar
Many vocabulariesVocabularies with relations to DCAT or same model
bull DCAT-AP and other extensions
bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT
bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)
bull Schemaorg
Other vocabulariesbull DataCite and re3data
bull CERIF for datasets
bull VIVO Datastar
bull INSPIRE
Examples of application of DCAT
bull CKAN data catalog tool(more in your workshop)
bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances
bull CIARD RING federated data catalogmanaged by GFAR
CIARD RING
Datasets in the RING dataset hub
bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)
bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)
bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)
bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary
a ldquoRING DCAT profilerdquo will be published
Sample dataset record
RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt
ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt
ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt
ltrdftype rdfresource=httpschemaorgDatasetgt
ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt
ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt
ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt
ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt
ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltdctpublisher rdfresource=httpringciardnetnode19510gt
ltschemapublisher rdfresource=httpringciardnetnode19510gt
ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt
ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt
ltdctspatialgtNationalltns1spatialgt
ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt
ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt
ltdctconformsTo rdfresource=httpringciardnetnode19239gt
ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt
ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt
ltdcttype rdfresource=httpringciardnettaxonomy_term81gt
ltdcatcatalog rdfresource=httpringciardnetnode19436gt
ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt
ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt
ltrdfDescriptiongt
References1 Issues and Recommendations Associated with Distributed Computation and Data
Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ
2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)
3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)
4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf
5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page
6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki
Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat
bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-
profile-data-portals-europe-final
bull DataCube httppurlorglinked-datacube
bull VOID httprdfsorgnsvoid-guide
bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml
bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology
bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables
bull CKAN httpckanorg
bull Dataverse httpdataverseorg
bull Datahub httpdatahubio
bull DataCite httpsearchdataciteorguiq=subject3Aagriculture
bull Re3data httpwwwre3dataorg
bull OpenAIRE httpswwwopenaireeu
bull CIARD RING httpringciardinfo
Dataset description and DCAT
DRTCISI-ICSUCODATAInternational Workshop on
Open Data Repositories
Thank you
Valeria Pesce
valeriapescefaoorg
Dataset metadata
So what metadata do applications need to find in data catalogs
Dataset metadata for applications
1) General metadata about the dataset1) identifier(s)2) who is responsible for it3) when and where the data were collected4) relations to organizations persons publications software
projects fundinghellip5) the conditions for re-use (rights licenses)6) provenance versions7) the specific coverage of the dataset (type of data thematic
coverage geographic coverage)
8) time and space slices subsets 9) the ldquodimensionsrdquo and ldquovariablesrdquo covered by the dataset10) the semantics of the dimensions (units of measure time
granularity syntax reference taxonomies)
WHY dimensions and semantics
Example of search by researcher
Irsquom doing research on plant phenotypes give me all datasets of crop phenotypic data that include the dimensions of time geographic location and height plus units of measure used for time and height where geographic location is expressed as coordinates (because my software only processes coordinates)
Data seralizations
bull The metadata above refer to the collection as a whole but additional technical metadata refer to the different ldquoserializationsrdquo of the datahellip
bull In many dataset description models the metadata about the data serialization is attached to the dataset
bull In other dataset description models information about the data serializations is not considered inherent to the nature and content of the data collection so it is not attached to the dataset but rather to related entities called ldquodistributionsrdquo
Distribution metadata for applications
Applications have to find metadata about the actual ldquoserializationsrdquo or ldquodistributionsrdquo of the dataset to understand
1 Where to retrieve it URL (data dump servicehellip)
2 the necessary technical specifications to retrieve and parse a
distribution of the dataset
- format (file format data format)
- protocol API parametershellip
And if different for different distributions again3 the conditions for re-use (rights licenses)
4 the semantics of the dimensions (units of measure time granularity syntax reference taxonomies) if different semantics for different distributions
WHY protocols and API params
Example
Some datasets are available behind a service eg RDF datasets are often retrieved as subsets of an RDF store through SPARQL queries or national research institutes provide access to datasets behind a web service accepting parameters to filter the datasets
Use case I have an application that can fetch from several SOAP web services (protocol) at once automatically if it knows the parameters required by the SOAP service and the required syntax and type of the parameters
General issue with all metadata
Standardization of the values eg for ldquothematic coveragerdquo or ldquodimensionsrdquo of datasets ldquoformatrdquo or ldquoprotocol usedrdquo of distributions etc
- The value should be standardized possibly a URI
- The value should be part of an authority list code list
Andhellip There is no authority ldquovalue vocabularyrdquo or code list for many of these values
No out-of-the-box solution
bull Do existing data catalog tools normally cover these metadataNO
bull Do existing metadata vocabularies (RDF and not) cover all these metadata Or do they adopt the same modelNO
BUT by using even basic metadata to describe datasets in data catalogs ldquopublishers increase discoverability and enable applications to consume metadata from multiple catalogs This further enables decentralized publishing of catalogs and facilitates federated dataset search across sitesrdquo(from W3C page on ldquoBest Practices for Publishing Linked Datardquo)
Dataset description vocabularies
Letrsquos see if the main vocabularies to describe datasets provide themetadata we think are needed
for full interoperability
Semantic interoperability
In this presentation we cover only RDF vocabularies with special focus on semantic interoperability
Dataset metadata have been managed in several ways before semantic technologies see NetCDF or HDF5 structures and various hierarchical array-based structures used especially in observations datasets ndashincluding dataset metadata at the top and data arrays below
Dataset description vocabularies
bull DCAT vocabularybull RDF vocabulary for describing any dataset
bull Datasets can be standalone or part of a ldquocatalogrdquo
bull Metadata about dataset (collection) and related distributions
bull DataCube vocabularybull RDF vocabulary for describing statistical datasets
bull Useful for attaching metadata about the ldquodata structurerdquo of a dataset
bull VOID vocabularybull RDF vocabulary for expressing metadata about RDF datasets
bull Useful especially for metadata related to RDF data services
Definition of ldquodatasetrdquo in DCAT
Definition given by the W3C Government Linked Data Working Group
A dataset is ldquoa collection of data published or curated by a single source and available for access or download in one or more formatsrdquo
The ldquoinstancesrdquo of the dataset ldquoavailable for access or download in one or more formatsrdquo are called ldquodistributionsrdquo A dataset can have many distributions
Examples of distributions include a downloadable CSV file an API or an RSS feed
c
DCAT model
1) identifier(s)2) who is responsible for it3) when and where the data were
collected4) relations to organizations persons
publications software projects fundinghellip NO
5) the conditions for re-use (rights licenses)
6) provenance version NO7) coverage of the dataset8) dimensions and semantics NO9) slices subsets NO10) URL11) Format12) Protocols parameters NO
DCAT and DCAT-AP
The DCAT Application profile for data portals in
Europe (DCAT-AP) is an extension of DCAT
It combines DCAT with the W3C Asset Description Metadata
Schema (ADMS) vocabulary plus classes and properties from
Dublin Core SKOS and Vcard in an Application profile
The elaboration of the DCAT-AP was a joint initiative of DG
CONNECT the EU Publications Office and the ISA Programme
A diagram of the full DCAT-AP specification is on the next slide
Full DCAT AP
versions
rights and provenance
standards
rights
format
relation
1) who is responsible for it MORE2) relations to organizations projects publications funding Partly
3) the conditions for re-use (rights licenses) MORE4) provenance version YES5) protocols parameters NO6) dimensions and semantics NO
More than DCAT
Limitations of DCAT
It doesnrsquot cover
bull semantic relations to organisations persons software projects fundinghellip
bull dimensions and variables and syntax semantics of dimensions and variables
bull protocols and parameters for datasets available through APIs
bull time and space slices subsets
Combining DCAT with other vocabularies
ldquoOther complementary vocabularies may be used together with DCAT to provide more detailed format-specific information For example properties from the VoID vocabulary can be used if that dataset is in RDF formatrdquo
(from the DCAT specification)
DataCube structure definition
A cube is organized according to a set of dimensions attributes and measures
bull The dimension components serve to identify the observed dimensions (eg time geographic region gender elevationhellip)
bull The measure components represent the phenomenon being observed (eg life expectancy)
bull The attribute components enable specification of the units of measure any scaling factors and metadata such as the status of the observation (eg estimated provisional)
DataCube model for dataset structure
This part of the model could be re-used for describing the dimensions of any dataset also non-statistical
1) dimensions and semantics YES2) slices subsets YES
More than DCAT-AP
VOID model
c
dctlicensewvnorms wvwaiver
3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES
More than DCAT-AP
Complementing DCAT
bull For dimensions semantics of dimensions slicingbull DataCube
bull DDI
bull For API aspectsbull VOID (linked data)
bull Web services descriptions (Hydra (WSDL WADL))
bull For relations to organizations projects publications fundinghellipbull CERIF for datasets
bull VIVO Datastar
Many vocabulariesVocabularies with relations to DCAT or same model
bull DCAT-AP and other extensions
bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT
bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)
bull Schemaorg
Other vocabulariesbull DataCite and re3data
bull CERIF for datasets
bull VIVO Datastar
bull INSPIRE
Examples of application of DCAT
bull CKAN data catalog tool(more in your workshop)
bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances
bull CIARD RING federated data catalogmanaged by GFAR
CIARD RING
Datasets in the RING dataset hub
bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)
bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)
bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)
bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary
a ldquoRING DCAT profilerdquo will be published
Sample dataset record
RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt
ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt
ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt
ltrdftype rdfresource=httpschemaorgDatasetgt
ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt
ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt
ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt
ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt
ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltdctpublisher rdfresource=httpringciardnetnode19510gt
ltschemapublisher rdfresource=httpringciardnetnode19510gt
ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt
ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt
ltdctspatialgtNationalltns1spatialgt
ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt
ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt
ltdctconformsTo rdfresource=httpringciardnetnode19239gt
ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt
ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt
ltdcttype rdfresource=httpringciardnettaxonomy_term81gt
ltdcatcatalog rdfresource=httpringciardnetnode19436gt
ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt
ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt
ltrdfDescriptiongt
References1 Issues and Recommendations Associated with Distributed Computation and Data
Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ
2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)
3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)
4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf
5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page
6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki
Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat
bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-
profile-data-portals-europe-final
bull DataCube httppurlorglinked-datacube
bull VOID httprdfsorgnsvoid-guide
bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml
bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology
bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables
bull CKAN httpckanorg
bull Dataverse httpdataverseorg
bull Datahub httpdatahubio
bull DataCite httpsearchdataciteorguiq=subject3Aagriculture
bull Re3data httpwwwre3dataorg
bull OpenAIRE httpswwwopenaireeu
bull CIARD RING httpringciardinfo
Dataset description and DCAT
DRTCISI-ICSUCODATAInternational Workshop on
Open Data Repositories
Thank you
Valeria Pesce
valeriapescefaoorg
Dataset metadata for applications
1) General metadata about the dataset1) identifier(s)2) who is responsible for it3) when and where the data were collected4) relations to organizations persons publications software
projects fundinghellip5) the conditions for re-use (rights licenses)6) provenance versions7) the specific coverage of the dataset (type of data thematic
coverage geographic coverage)
8) time and space slices subsets 9) the ldquodimensionsrdquo and ldquovariablesrdquo covered by the dataset10) the semantics of the dimensions (units of measure time
granularity syntax reference taxonomies)
WHY dimensions and semantics
Example of search by researcher
Irsquom doing research on plant phenotypes give me all datasets of crop phenotypic data that include the dimensions of time geographic location and height plus units of measure used for time and height where geographic location is expressed as coordinates (because my software only processes coordinates)
Data seralizations
bull The metadata above refer to the collection as a whole but additional technical metadata refer to the different ldquoserializationsrdquo of the datahellip
bull In many dataset description models the metadata about the data serialization is attached to the dataset
bull In other dataset description models information about the data serializations is not considered inherent to the nature and content of the data collection so it is not attached to the dataset but rather to related entities called ldquodistributionsrdquo
Distribution metadata for applications
Applications have to find metadata about the actual ldquoserializationsrdquo or ldquodistributionsrdquo of the dataset to understand
1 Where to retrieve it URL (data dump servicehellip)
2 the necessary technical specifications to retrieve and parse a
distribution of the dataset
- format (file format data format)
- protocol API parametershellip
And if different for different distributions again3 the conditions for re-use (rights licenses)
4 the semantics of the dimensions (units of measure time granularity syntax reference taxonomies) if different semantics for different distributions
WHY protocols and API params
Example
Some datasets are available behind a service eg RDF datasets are often retrieved as subsets of an RDF store through SPARQL queries or national research institutes provide access to datasets behind a web service accepting parameters to filter the datasets
Use case I have an application that can fetch from several SOAP web services (protocol) at once automatically if it knows the parameters required by the SOAP service and the required syntax and type of the parameters
General issue with all metadata
Standardization of the values eg for ldquothematic coveragerdquo or ldquodimensionsrdquo of datasets ldquoformatrdquo or ldquoprotocol usedrdquo of distributions etc
- The value should be standardized possibly a URI
- The value should be part of an authority list code list
Andhellip There is no authority ldquovalue vocabularyrdquo or code list for many of these values
No out-of-the-box solution
bull Do existing data catalog tools normally cover these metadataNO
bull Do existing metadata vocabularies (RDF and not) cover all these metadata Or do they adopt the same modelNO
BUT by using even basic metadata to describe datasets in data catalogs ldquopublishers increase discoverability and enable applications to consume metadata from multiple catalogs This further enables decentralized publishing of catalogs and facilitates federated dataset search across sitesrdquo(from W3C page on ldquoBest Practices for Publishing Linked Datardquo)
Dataset description vocabularies
Letrsquos see if the main vocabularies to describe datasets provide themetadata we think are needed
for full interoperability
Semantic interoperability
In this presentation we cover only RDF vocabularies with special focus on semantic interoperability
Dataset metadata have been managed in several ways before semantic technologies see NetCDF or HDF5 structures and various hierarchical array-based structures used especially in observations datasets ndashincluding dataset metadata at the top and data arrays below
Dataset description vocabularies
bull DCAT vocabularybull RDF vocabulary for describing any dataset
bull Datasets can be standalone or part of a ldquocatalogrdquo
bull Metadata about dataset (collection) and related distributions
bull DataCube vocabularybull RDF vocabulary for describing statistical datasets
bull Useful for attaching metadata about the ldquodata structurerdquo of a dataset
bull VOID vocabularybull RDF vocabulary for expressing metadata about RDF datasets
bull Useful especially for metadata related to RDF data services
Definition of ldquodatasetrdquo in DCAT
Definition given by the W3C Government Linked Data Working Group
A dataset is ldquoa collection of data published or curated by a single source and available for access or download in one or more formatsrdquo
The ldquoinstancesrdquo of the dataset ldquoavailable for access or download in one or more formatsrdquo are called ldquodistributionsrdquo A dataset can have many distributions
Examples of distributions include a downloadable CSV file an API or an RSS feed
c
DCAT model
1) identifier(s)2) who is responsible for it3) when and where the data were
collected4) relations to organizations persons
publications software projects fundinghellip NO
5) the conditions for re-use (rights licenses)
6) provenance version NO7) coverage of the dataset8) dimensions and semantics NO9) slices subsets NO10) URL11) Format12) Protocols parameters NO
DCAT and DCAT-AP
The DCAT Application profile for data portals in
Europe (DCAT-AP) is an extension of DCAT
It combines DCAT with the W3C Asset Description Metadata
Schema (ADMS) vocabulary plus classes and properties from
Dublin Core SKOS and Vcard in an Application profile
The elaboration of the DCAT-AP was a joint initiative of DG
CONNECT the EU Publications Office and the ISA Programme
A diagram of the full DCAT-AP specification is on the next slide
Full DCAT AP
versions
rights and provenance
standards
rights
format
relation
1) who is responsible for it MORE2) relations to organizations projects publications funding Partly
3) the conditions for re-use (rights licenses) MORE4) provenance version YES5) protocols parameters NO6) dimensions and semantics NO
More than DCAT
Limitations of DCAT
It doesnrsquot cover
bull semantic relations to organisations persons software projects fundinghellip
bull dimensions and variables and syntax semantics of dimensions and variables
bull protocols and parameters for datasets available through APIs
bull time and space slices subsets
Combining DCAT with other vocabularies
ldquoOther complementary vocabularies may be used together with DCAT to provide more detailed format-specific information For example properties from the VoID vocabulary can be used if that dataset is in RDF formatrdquo
(from the DCAT specification)
DataCube structure definition
A cube is organized according to a set of dimensions attributes and measures
bull The dimension components serve to identify the observed dimensions (eg time geographic region gender elevationhellip)
bull The measure components represent the phenomenon being observed (eg life expectancy)
bull The attribute components enable specification of the units of measure any scaling factors and metadata such as the status of the observation (eg estimated provisional)
DataCube model for dataset structure
This part of the model could be re-used for describing the dimensions of any dataset also non-statistical
1) dimensions and semantics YES2) slices subsets YES
More than DCAT-AP
VOID model
c
dctlicensewvnorms wvwaiver
3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES
More than DCAT-AP
Complementing DCAT
bull For dimensions semantics of dimensions slicingbull DataCube
bull DDI
bull For API aspectsbull VOID (linked data)
bull Web services descriptions (Hydra (WSDL WADL))
bull For relations to organizations projects publications fundinghellipbull CERIF for datasets
bull VIVO Datastar
Many vocabulariesVocabularies with relations to DCAT or same model
bull DCAT-AP and other extensions
bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT
bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)
bull Schemaorg
Other vocabulariesbull DataCite and re3data
bull CERIF for datasets
bull VIVO Datastar
bull INSPIRE
Examples of application of DCAT
bull CKAN data catalog tool(more in your workshop)
bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances
bull CIARD RING federated data catalogmanaged by GFAR
CIARD RING
Datasets in the RING dataset hub
bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)
bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)
bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)
bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary
a ldquoRING DCAT profilerdquo will be published
Sample dataset record
RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt
ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt
ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt
ltrdftype rdfresource=httpschemaorgDatasetgt
ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt
ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt
ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt
ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt
ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltdctpublisher rdfresource=httpringciardnetnode19510gt
ltschemapublisher rdfresource=httpringciardnetnode19510gt
ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt
ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt
ltdctspatialgtNationalltns1spatialgt
ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt
ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt
ltdctconformsTo rdfresource=httpringciardnetnode19239gt
ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt
ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt
ltdcttype rdfresource=httpringciardnettaxonomy_term81gt
ltdcatcatalog rdfresource=httpringciardnetnode19436gt
ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt
ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt
ltrdfDescriptiongt
References1 Issues and Recommendations Associated with Distributed Computation and Data
Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ
2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)
3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)
4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf
5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page
6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki
Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat
bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-
profile-data-portals-europe-final
bull DataCube httppurlorglinked-datacube
bull VOID httprdfsorgnsvoid-guide
bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml
bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology
bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables
bull CKAN httpckanorg
bull Dataverse httpdataverseorg
bull Datahub httpdatahubio
bull DataCite httpsearchdataciteorguiq=subject3Aagriculture
bull Re3data httpwwwre3dataorg
bull OpenAIRE httpswwwopenaireeu
bull CIARD RING httpringciardinfo
Dataset description and DCAT
DRTCISI-ICSUCODATAInternational Workshop on
Open Data Repositories
Thank you
Valeria Pesce
valeriapescefaoorg
WHY dimensions and semantics
Example of search by researcher
Irsquom doing research on plant phenotypes give me all datasets of crop phenotypic data that include the dimensions of time geographic location and height plus units of measure used for time and height where geographic location is expressed as coordinates (because my software only processes coordinates)
Data seralizations
bull The metadata above refer to the collection as a whole but additional technical metadata refer to the different ldquoserializationsrdquo of the datahellip
bull In many dataset description models the metadata about the data serialization is attached to the dataset
bull In other dataset description models information about the data serializations is not considered inherent to the nature and content of the data collection so it is not attached to the dataset but rather to related entities called ldquodistributionsrdquo
Distribution metadata for applications
Applications have to find metadata about the actual ldquoserializationsrdquo or ldquodistributionsrdquo of the dataset to understand
1 Where to retrieve it URL (data dump servicehellip)
2 the necessary technical specifications to retrieve and parse a
distribution of the dataset
- format (file format data format)
- protocol API parametershellip
And if different for different distributions again3 the conditions for re-use (rights licenses)
4 the semantics of the dimensions (units of measure time granularity syntax reference taxonomies) if different semantics for different distributions
WHY protocols and API params
Example
Some datasets are available behind a service eg RDF datasets are often retrieved as subsets of an RDF store through SPARQL queries or national research institutes provide access to datasets behind a web service accepting parameters to filter the datasets
Use case I have an application that can fetch from several SOAP web services (protocol) at once automatically if it knows the parameters required by the SOAP service and the required syntax and type of the parameters
General issue with all metadata
Standardization of the values eg for ldquothematic coveragerdquo or ldquodimensionsrdquo of datasets ldquoformatrdquo or ldquoprotocol usedrdquo of distributions etc
- The value should be standardized possibly a URI
- The value should be part of an authority list code list
Andhellip There is no authority ldquovalue vocabularyrdquo or code list for many of these values
No out-of-the-box solution
bull Do existing data catalog tools normally cover these metadataNO
bull Do existing metadata vocabularies (RDF and not) cover all these metadata Or do they adopt the same modelNO
BUT by using even basic metadata to describe datasets in data catalogs ldquopublishers increase discoverability and enable applications to consume metadata from multiple catalogs This further enables decentralized publishing of catalogs and facilitates federated dataset search across sitesrdquo(from W3C page on ldquoBest Practices for Publishing Linked Datardquo)
Dataset description vocabularies
Letrsquos see if the main vocabularies to describe datasets provide themetadata we think are needed
for full interoperability
Semantic interoperability
In this presentation we cover only RDF vocabularies with special focus on semantic interoperability
Dataset metadata have been managed in several ways before semantic technologies see NetCDF or HDF5 structures and various hierarchical array-based structures used especially in observations datasets ndashincluding dataset metadata at the top and data arrays below
Dataset description vocabularies
bull DCAT vocabularybull RDF vocabulary for describing any dataset
bull Datasets can be standalone or part of a ldquocatalogrdquo
bull Metadata about dataset (collection) and related distributions
bull DataCube vocabularybull RDF vocabulary for describing statistical datasets
bull Useful for attaching metadata about the ldquodata structurerdquo of a dataset
bull VOID vocabularybull RDF vocabulary for expressing metadata about RDF datasets
bull Useful especially for metadata related to RDF data services
Definition of ldquodatasetrdquo in DCAT
Definition given by the W3C Government Linked Data Working Group
A dataset is ldquoa collection of data published or curated by a single source and available for access or download in one or more formatsrdquo
The ldquoinstancesrdquo of the dataset ldquoavailable for access or download in one or more formatsrdquo are called ldquodistributionsrdquo A dataset can have many distributions
Examples of distributions include a downloadable CSV file an API or an RSS feed
c
DCAT model
1) identifier(s)2) who is responsible for it3) when and where the data were
collected4) relations to organizations persons
publications software projects fundinghellip NO
5) the conditions for re-use (rights licenses)
6) provenance version NO7) coverage of the dataset8) dimensions and semantics NO9) slices subsets NO10) URL11) Format12) Protocols parameters NO
DCAT and DCAT-AP
The DCAT Application profile for data portals in
Europe (DCAT-AP) is an extension of DCAT
It combines DCAT with the W3C Asset Description Metadata
Schema (ADMS) vocabulary plus classes and properties from
Dublin Core SKOS and Vcard in an Application profile
The elaboration of the DCAT-AP was a joint initiative of DG
CONNECT the EU Publications Office and the ISA Programme
A diagram of the full DCAT-AP specification is on the next slide
Full DCAT AP
versions
rights and provenance
standards
rights
format
relation
1) who is responsible for it MORE2) relations to organizations projects publications funding Partly
3) the conditions for re-use (rights licenses) MORE4) provenance version YES5) protocols parameters NO6) dimensions and semantics NO
More than DCAT
Limitations of DCAT
It doesnrsquot cover
bull semantic relations to organisations persons software projects fundinghellip
bull dimensions and variables and syntax semantics of dimensions and variables
bull protocols and parameters for datasets available through APIs
bull time and space slices subsets
Combining DCAT with other vocabularies
ldquoOther complementary vocabularies may be used together with DCAT to provide more detailed format-specific information For example properties from the VoID vocabulary can be used if that dataset is in RDF formatrdquo
(from the DCAT specification)
DataCube structure definition
A cube is organized according to a set of dimensions attributes and measures
bull The dimension components serve to identify the observed dimensions (eg time geographic region gender elevationhellip)
bull The measure components represent the phenomenon being observed (eg life expectancy)
bull The attribute components enable specification of the units of measure any scaling factors and metadata such as the status of the observation (eg estimated provisional)
DataCube model for dataset structure
This part of the model could be re-used for describing the dimensions of any dataset also non-statistical
1) dimensions and semantics YES2) slices subsets YES
More than DCAT-AP
VOID model
c
dctlicensewvnorms wvwaiver
3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES
More than DCAT-AP
Complementing DCAT
bull For dimensions semantics of dimensions slicingbull DataCube
bull DDI
bull For API aspectsbull VOID (linked data)
bull Web services descriptions (Hydra (WSDL WADL))
bull For relations to organizations projects publications fundinghellipbull CERIF for datasets
bull VIVO Datastar
Many vocabulariesVocabularies with relations to DCAT or same model
bull DCAT-AP and other extensions
bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT
bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)
bull Schemaorg
Other vocabulariesbull DataCite and re3data
bull CERIF for datasets
bull VIVO Datastar
bull INSPIRE
Examples of application of DCAT
bull CKAN data catalog tool(more in your workshop)
bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances
bull CIARD RING federated data catalogmanaged by GFAR
CIARD RING
Datasets in the RING dataset hub
bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)
bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)
bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)
bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary
a ldquoRING DCAT profilerdquo will be published
Sample dataset record
RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt
ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt
ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt
ltrdftype rdfresource=httpschemaorgDatasetgt
ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt
ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt
ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt
ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt
ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltdctpublisher rdfresource=httpringciardnetnode19510gt
ltschemapublisher rdfresource=httpringciardnetnode19510gt
ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt
ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt
ltdctspatialgtNationalltns1spatialgt
ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt
ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt
ltdctconformsTo rdfresource=httpringciardnetnode19239gt
ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt
ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt
ltdcttype rdfresource=httpringciardnettaxonomy_term81gt
ltdcatcatalog rdfresource=httpringciardnetnode19436gt
ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt
ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt
ltrdfDescriptiongt
References1 Issues and Recommendations Associated with Distributed Computation and Data
Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ
2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)
3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)
4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf
5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page
6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki
Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat
bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-
profile-data-portals-europe-final
bull DataCube httppurlorglinked-datacube
bull VOID httprdfsorgnsvoid-guide
bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml
bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology
bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables
bull CKAN httpckanorg
bull Dataverse httpdataverseorg
bull Datahub httpdatahubio
bull DataCite httpsearchdataciteorguiq=subject3Aagriculture
bull Re3data httpwwwre3dataorg
bull OpenAIRE httpswwwopenaireeu
bull CIARD RING httpringciardinfo
Dataset description and DCAT
DRTCISI-ICSUCODATAInternational Workshop on
Open Data Repositories
Thank you
Valeria Pesce
valeriapescefaoorg
Data seralizations
bull The metadata above refer to the collection as a whole but additional technical metadata refer to the different ldquoserializationsrdquo of the datahellip
bull In many dataset description models the metadata about the data serialization is attached to the dataset
bull In other dataset description models information about the data serializations is not considered inherent to the nature and content of the data collection so it is not attached to the dataset but rather to related entities called ldquodistributionsrdquo
Distribution metadata for applications
Applications have to find metadata about the actual ldquoserializationsrdquo or ldquodistributionsrdquo of the dataset to understand
1 Where to retrieve it URL (data dump servicehellip)
2 the necessary technical specifications to retrieve and parse a
distribution of the dataset
- format (file format data format)
- protocol API parametershellip
And if different for different distributions again3 the conditions for re-use (rights licenses)
4 the semantics of the dimensions (units of measure time granularity syntax reference taxonomies) if different semantics for different distributions
WHY protocols and API params
Example
Some datasets are available behind a service eg RDF datasets are often retrieved as subsets of an RDF store through SPARQL queries or national research institutes provide access to datasets behind a web service accepting parameters to filter the datasets
Use case I have an application that can fetch from several SOAP web services (protocol) at once automatically if it knows the parameters required by the SOAP service and the required syntax and type of the parameters
General issue with all metadata
Standardization of the values eg for ldquothematic coveragerdquo or ldquodimensionsrdquo of datasets ldquoformatrdquo or ldquoprotocol usedrdquo of distributions etc
- The value should be standardized possibly a URI
- The value should be part of an authority list code list
Andhellip There is no authority ldquovalue vocabularyrdquo or code list for many of these values
No out-of-the-box solution
bull Do existing data catalog tools normally cover these metadataNO
bull Do existing metadata vocabularies (RDF and not) cover all these metadata Or do they adopt the same modelNO
BUT by using even basic metadata to describe datasets in data catalogs ldquopublishers increase discoverability and enable applications to consume metadata from multiple catalogs This further enables decentralized publishing of catalogs and facilitates federated dataset search across sitesrdquo(from W3C page on ldquoBest Practices for Publishing Linked Datardquo)
Dataset description vocabularies
Letrsquos see if the main vocabularies to describe datasets provide themetadata we think are needed
for full interoperability
Semantic interoperability
In this presentation we cover only RDF vocabularies with special focus on semantic interoperability
Dataset metadata have been managed in several ways before semantic technologies see NetCDF or HDF5 structures and various hierarchical array-based structures used especially in observations datasets ndashincluding dataset metadata at the top and data arrays below
Dataset description vocabularies
bull DCAT vocabularybull RDF vocabulary for describing any dataset
bull Datasets can be standalone or part of a ldquocatalogrdquo
bull Metadata about dataset (collection) and related distributions
bull DataCube vocabularybull RDF vocabulary for describing statistical datasets
bull Useful for attaching metadata about the ldquodata structurerdquo of a dataset
bull VOID vocabularybull RDF vocabulary for expressing metadata about RDF datasets
bull Useful especially for metadata related to RDF data services
Definition of ldquodatasetrdquo in DCAT
Definition given by the W3C Government Linked Data Working Group
A dataset is ldquoa collection of data published or curated by a single source and available for access or download in one or more formatsrdquo
The ldquoinstancesrdquo of the dataset ldquoavailable for access or download in one or more formatsrdquo are called ldquodistributionsrdquo A dataset can have many distributions
Examples of distributions include a downloadable CSV file an API or an RSS feed
c
DCAT model
1) identifier(s)2) who is responsible for it3) when and where the data were
collected4) relations to organizations persons
publications software projects fundinghellip NO
5) the conditions for re-use (rights licenses)
6) provenance version NO7) coverage of the dataset8) dimensions and semantics NO9) slices subsets NO10) URL11) Format12) Protocols parameters NO
DCAT and DCAT-AP
The DCAT Application profile for data portals in
Europe (DCAT-AP) is an extension of DCAT
It combines DCAT with the W3C Asset Description Metadata
Schema (ADMS) vocabulary plus classes and properties from
Dublin Core SKOS and Vcard in an Application profile
The elaboration of the DCAT-AP was a joint initiative of DG
CONNECT the EU Publications Office and the ISA Programme
A diagram of the full DCAT-AP specification is on the next slide
Full DCAT AP
versions
rights and provenance
standards
rights
format
relation
1) who is responsible for it MORE2) relations to organizations projects publications funding Partly
3) the conditions for re-use (rights licenses) MORE4) provenance version YES5) protocols parameters NO6) dimensions and semantics NO
More than DCAT
Limitations of DCAT
It doesnrsquot cover
bull semantic relations to organisations persons software projects fundinghellip
bull dimensions and variables and syntax semantics of dimensions and variables
bull protocols and parameters for datasets available through APIs
bull time and space slices subsets
Combining DCAT with other vocabularies
ldquoOther complementary vocabularies may be used together with DCAT to provide more detailed format-specific information For example properties from the VoID vocabulary can be used if that dataset is in RDF formatrdquo
(from the DCAT specification)
DataCube structure definition
A cube is organized according to a set of dimensions attributes and measures
bull The dimension components serve to identify the observed dimensions (eg time geographic region gender elevationhellip)
bull The measure components represent the phenomenon being observed (eg life expectancy)
bull The attribute components enable specification of the units of measure any scaling factors and metadata such as the status of the observation (eg estimated provisional)
DataCube model for dataset structure
This part of the model could be re-used for describing the dimensions of any dataset also non-statistical
1) dimensions and semantics YES2) slices subsets YES
More than DCAT-AP
VOID model
c
dctlicensewvnorms wvwaiver
3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES
More than DCAT-AP
Complementing DCAT
bull For dimensions semantics of dimensions slicingbull DataCube
bull DDI
bull For API aspectsbull VOID (linked data)
bull Web services descriptions (Hydra (WSDL WADL))
bull For relations to organizations projects publications fundinghellipbull CERIF for datasets
bull VIVO Datastar
Many vocabulariesVocabularies with relations to DCAT or same model
bull DCAT-AP and other extensions
bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT
bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)
bull Schemaorg
Other vocabulariesbull DataCite and re3data
bull CERIF for datasets
bull VIVO Datastar
bull INSPIRE
Examples of application of DCAT
bull CKAN data catalog tool(more in your workshop)
bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances
bull CIARD RING federated data catalogmanaged by GFAR
CIARD RING
Datasets in the RING dataset hub
bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)
bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)
bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)
bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary
a ldquoRING DCAT profilerdquo will be published
Sample dataset record
RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt
ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt
ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt
ltrdftype rdfresource=httpschemaorgDatasetgt
ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt
ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt
ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt
ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt
ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltdctpublisher rdfresource=httpringciardnetnode19510gt
ltschemapublisher rdfresource=httpringciardnetnode19510gt
ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt
ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt
ltdctspatialgtNationalltns1spatialgt
ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt
ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt
ltdctconformsTo rdfresource=httpringciardnetnode19239gt
ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt
ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt
ltdcttype rdfresource=httpringciardnettaxonomy_term81gt
ltdcatcatalog rdfresource=httpringciardnetnode19436gt
ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt
ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt
ltrdfDescriptiongt
References1 Issues and Recommendations Associated with Distributed Computation and Data
Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ
2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)
3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)
4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf
5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page
6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki
Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat
bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-
profile-data-portals-europe-final
bull DataCube httppurlorglinked-datacube
bull VOID httprdfsorgnsvoid-guide
bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml
bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology
bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables
bull CKAN httpckanorg
bull Dataverse httpdataverseorg
bull Datahub httpdatahubio
bull DataCite httpsearchdataciteorguiq=subject3Aagriculture
bull Re3data httpwwwre3dataorg
bull OpenAIRE httpswwwopenaireeu
bull CIARD RING httpringciardinfo
Dataset description and DCAT
DRTCISI-ICSUCODATAInternational Workshop on
Open Data Repositories
Thank you
Valeria Pesce
valeriapescefaoorg
Distribution metadata for applications
Applications have to find metadata about the actual ldquoserializationsrdquo or ldquodistributionsrdquo of the dataset to understand
1 Where to retrieve it URL (data dump servicehellip)
2 the necessary technical specifications to retrieve and parse a
distribution of the dataset
- format (file format data format)
- protocol API parametershellip
And if different for different distributions again3 the conditions for re-use (rights licenses)
4 the semantics of the dimensions (units of measure time granularity syntax reference taxonomies) if different semantics for different distributions
WHY protocols and API params
Example
Some datasets are available behind a service eg RDF datasets are often retrieved as subsets of an RDF store through SPARQL queries or national research institutes provide access to datasets behind a web service accepting parameters to filter the datasets
Use case I have an application that can fetch from several SOAP web services (protocol) at once automatically if it knows the parameters required by the SOAP service and the required syntax and type of the parameters
General issue with all metadata
Standardization of the values eg for ldquothematic coveragerdquo or ldquodimensionsrdquo of datasets ldquoformatrdquo or ldquoprotocol usedrdquo of distributions etc
- The value should be standardized possibly a URI
- The value should be part of an authority list code list
Andhellip There is no authority ldquovalue vocabularyrdquo or code list for many of these values
No out-of-the-box solution
bull Do existing data catalog tools normally cover these metadataNO
bull Do existing metadata vocabularies (RDF and not) cover all these metadata Or do they adopt the same modelNO
BUT by using even basic metadata to describe datasets in data catalogs ldquopublishers increase discoverability and enable applications to consume metadata from multiple catalogs This further enables decentralized publishing of catalogs and facilitates federated dataset search across sitesrdquo(from W3C page on ldquoBest Practices for Publishing Linked Datardquo)
Dataset description vocabularies
Letrsquos see if the main vocabularies to describe datasets provide themetadata we think are needed
for full interoperability
Semantic interoperability
In this presentation we cover only RDF vocabularies with special focus on semantic interoperability
Dataset metadata have been managed in several ways before semantic technologies see NetCDF or HDF5 structures and various hierarchical array-based structures used especially in observations datasets ndashincluding dataset metadata at the top and data arrays below
Dataset description vocabularies
bull DCAT vocabularybull RDF vocabulary for describing any dataset
bull Datasets can be standalone or part of a ldquocatalogrdquo
bull Metadata about dataset (collection) and related distributions
bull DataCube vocabularybull RDF vocabulary for describing statistical datasets
bull Useful for attaching metadata about the ldquodata structurerdquo of a dataset
bull VOID vocabularybull RDF vocabulary for expressing metadata about RDF datasets
bull Useful especially for metadata related to RDF data services
Definition of ldquodatasetrdquo in DCAT
Definition given by the W3C Government Linked Data Working Group
A dataset is ldquoa collection of data published or curated by a single source and available for access or download in one or more formatsrdquo
The ldquoinstancesrdquo of the dataset ldquoavailable for access or download in one or more formatsrdquo are called ldquodistributionsrdquo A dataset can have many distributions
Examples of distributions include a downloadable CSV file an API or an RSS feed
c
DCAT model
1) identifier(s)2) who is responsible for it3) when and where the data were
collected4) relations to organizations persons
publications software projects fundinghellip NO
5) the conditions for re-use (rights licenses)
6) provenance version NO7) coverage of the dataset8) dimensions and semantics NO9) slices subsets NO10) URL11) Format12) Protocols parameters NO
DCAT and DCAT-AP
The DCAT Application profile for data portals in
Europe (DCAT-AP) is an extension of DCAT
It combines DCAT with the W3C Asset Description Metadata
Schema (ADMS) vocabulary plus classes and properties from
Dublin Core SKOS and Vcard in an Application profile
The elaboration of the DCAT-AP was a joint initiative of DG
CONNECT the EU Publications Office and the ISA Programme
A diagram of the full DCAT-AP specification is on the next slide
Full DCAT AP
versions
rights and provenance
standards
rights
format
relation
1) who is responsible for it MORE2) relations to organizations projects publications funding Partly
3) the conditions for re-use (rights licenses) MORE4) provenance version YES5) protocols parameters NO6) dimensions and semantics NO
More than DCAT
Limitations of DCAT
It doesnrsquot cover
bull semantic relations to organisations persons software projects fundinghellip
bull dimensions and variables and syntax semantics of dimensions and variables
bull protocols and parameters for datasets available through APIs
bull time and space slices subsets
Combining DCAT with other vocabularies
ldquoOther complementary vocabularies may be used together with DCAT to provide more detailed format-specific information For example properties from the VoID vocabulary can be used if that dataset is in RDF formatrdquo
(from the DCAT specification)
DataCube structure definition
A cube is organized according to a set of dimensions attributes and measures
bull The dimension components serve to identify the observed dimensions (eg time geographic region gender elevationhellip)
bull The measure components represent the phenomenon being observed (eg life expectancy)
bull The attribute components enable specification of the units of measure any scaling factors and metadata such as the status of the observation (eg estimated provisional)
DataCube model for dataset structure
This part of the model could be re-used for describing the dimensions of any dataset also non-statistical
1) dimensions and semantics YES2) slices subsets YES
More than DCAT-AP
VOID model
c
dctlicensewvnorms wvwaiver
3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES
More than DCAT-AP
Complementing DCAT
bull For dimensions semantics of dimensions slicingbull DataCube
bull DDI
bull For API aspectsbull VOID (linked data)
bull Web services descriptions (Hydra (WSDL WADL))
bull For relations to organizations projects publications fundinghellipbull CERIF for datasets
bull VIVO Datastar
Many vocabulariesVocabularies with relations to DCAT or same model
bull DCAT-AP and other extensions
bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT
bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)
bull Schemaorg
Other vocabulariesbull DataCite and re3data
bull CERIF for datasets
bull VIVO Datastar
bull INSPIRE
Examples of application of DCAT
bull CKAN data catalog tool(more in your workshop)
bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances
bull CIARD RING federated data catalogmanaged by GFAR
CIARD RING
Datasets in the RING dataset hub
bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)
bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)
bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)
bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary
a ldquoRING DCAT profilerdquo will be published
Sample dataset record
RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt
ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt
ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt
ltrdftype rdfresource=httpschemaorgDatasetgt
ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt
ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt
ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt
ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt
ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltdctpublisher rdfresource=httpringciardnetnode19510gt
ltschemapublisher rdfresource=httpringciardnetnode19510gt
ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt
ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt
ltdctspatialgtNationalltns1spatialgt
ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt
ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt
ltdctconformsTo rdfresource=httpringciardnetnode19239gt
ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt
ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt
ltdcttype rdfresource=httpringciardnettaxonomy_term81gt
ltdcatcatalog rdfresource=httpringciardnetnode19436gt
ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt
ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt
ltrdfDescriptiongt
References1 Issues and Recommendations Associated with Distributed Computation and Data
Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ
2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)
3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)
4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf
5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page
6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki
Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat
bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-
profile-data-portals-europe-final
bull DataCube httppurlorglinked-datacube
bull VOID httprdfsorgnsvoid-guide
bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml
bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology
bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables
bull CKAN httpckanorg
bull Dataverse httpdataverseorg
bull Datahub httpdatahubio
bull DataCite httpsearchdataciteorguiq=subject3Aagriculture
bull Re3data httpwwwre3dataorg
bull OpenAIRE httpswwwopenaireeu
bull CIARD RING httpringciardinfo
Dataset description and DCAT
DRTCISI-ICSUCODATAInternational Workshop on
Open Data Repositories
Thank you
Valeria Pesce
valeriapescefaoorg
WHY protocols and API params
Example
Some datasets are available behind a service eg RDF datasets are often retrieved as subsets of an RDF store through SPARQL queries or national research institutes provide access to datasets behind a web service accepting parameters to filter the datasets
Use case I have an application that can fetch from several SOAP web services (protocol) at once automatically if it knows the parameters required by the SOAP service and the required syntax and type of the parameters
General issue with all metadata
Standardization of the values eg for ldquothematic coveragerdquo or ldquodimensionsrdquo of datasets ldquoformatrdquo or ldquoprotocol usedrdquo of distributions etc
- The value should be standardized possibly a URI
- The value should be part of an authority list code list
Andhellip There is no authority ldquovalue vocabularyrdquo or code list for many of these values
No out-of-the-box solution
bull Do existing data catalog tools normally cover these metadataNO
bull Do existing metadata vocabularies (RDF and not) cover all these metadata Or do they adopt the same modelNO
BUT by using even basic metadata to describe datasets in data catalogs ldquopublishers increase discoverability and enable applications to consume metadata from multiple catalogs This further enables decentralized publishing of catalogs and facilitates federated dataset search across sitesrdquo(from W3C page on ldquoBest Practices for Publishing Linked Datardquo)
Dataset description vocabularies
Letrsquos see if the main vocabularies to describe datasets provide themetadata we think are needed
for full interoperability
Semantic interoperability
In this presentation we cover only RDF vocabularies with special focus on semantic interoperability
Dataset metadata have been managed in several ways before semantic technologies see NetCDF or HDF5 structures and various hierarchical array-based structures used especially in observations datasets ndashincluding dataset metadata at the top and data arrays below
Dataset description vocabularies
bull DCAT vocabularybull RDF vocabulary for describing any dataset
bull Datasets can be standalone or part of a ldquocatalogrdquo
bull Metadata about dataset (collection) and related distributions
bull DataCube vocabularybull RDF vocabulary for describing statistical datasets
bull Useful for attaching metadata about the ldquodata structurerdquo of a dataset
bull VOID vocabularybull RDF vocabulary for expressing metadata about RDF datasets
bull Useful especially for metadata related to RDF data services
Definition of ldquodatasetrdquo in DCAT
Definition given by the W3C Government Linked Data Working Group
A dataset is ldquoa collection of data published or curated by a single source and available for access or download in one or more formatsrdquo
The ldquoinstancesrdquo of the dataset ldquoavailable for access or download in one or more formatsrdquo are called ldquodistributionsrdquo A dataset can have many distributions
Examples of distributions include a downloadable CSV file an API or an RSS feed
c
DCAT model
1) identifier(s)2) who is responsible for it3) when and where the data were
collected4) relations to organizations persons
publications software projects fundinghellip NO
5) the conditions for re-use (rights licenses)
6) provenance version NO7) coverage of the dataset8) dimensions and semantics NO9) slices subsets NO10) URL11) Format12) Protocols parameters NO
DCAT and DCAT-AP
The DCAT Application profile for data portals in
Europe (DCAT-AP) is an extension of DCAT
It combines DCAT with the W3C Asset Description Metadata
Schema (ADMS) vocabulary plus classes and properties from
Dublin Core SKOS and Vcard in an Application profile
The elaboration of the DCAT-AP was a joint initiative of DG
CONNECT the EU Publications Office and the ISA Programme
A diagram of the full DCAT-AP specification is on the next slide
Full DCAT AP
versions
rights and provenance
standards
rights
format
relation
1) who is responsible for it MORE2) relations to organizations projects publications funding Partly
3) the conditions for re-use (rights licenses) MORE4) provenance version YES5) protocols parameters NO6) dimensions and semantics NO
More than DCAT
Limitations of DCAT
It doesnrsquot cover
bull semantic relations to organisations persons software projects fundinghellip
bull dimensions and variables and syntax semantics of dimensions and variables
bull protocols and parameters for datasets available through APIs
bull time and space slices subsets
Combining DCAT with other vocabularies
ldquoOther complementary vocabularies may be used together with DCAT to provide more detailed format-specific information For example properties from the VoID vocabulary can be used if that dataset is in RDF formatrdquo
(from the DCAT specification)
DataCube structure definition
A cube is organized according to a set of dimensions attributes and measures
bull The dimension components serve to identify the observed dimensions (eg time geographic region gender elevationhellip)
bull The measure components represent the phenomenon being observed (eg life expectancy)
bull The attribute components enable specification of the units of measure any scaling factors and metadata such as the status of the observation (eg estimated provisional)
DataCube model for dataset structure
This part of the model could be re-used for describing the dimensions of any dataset also non-statistical
1) dimensions and semantics YES2) slices subsets YES
More than DCAT-AP
VOID model
c
dctlicensewvnorms wvwaiver
3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES
More than DCAT-AP
Complementing DCAT
bull For dimensions semantics of dimensions slicingbull DataCube
bull DDI
bull For API aspectsbull VOID (linked data)
bull Web services descriptions (Hydra (WSDL WADL))
bull For relations to organizations projects publications fundinghellipbull CERIF for datasets
bull VIVO Datastar
Many vocabulariesVocabularies with relations to DCAT or same model
bull DCAT-AP and other extensions
bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT
bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)
bull Schemaorg
Other vocabulariesbull DataCite and re3data
bull CERIF for datasets
bull VIVO Datastar
bull INSPIRE
Examples of application of DCAT
bull CKAN data catalog tool(more in your workshop)
bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances
bull CIARD RING federated data catalogmanaged by GFAR
CIARD RING
Datasets in the RING dataset hub
bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)
bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)
bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)
bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary
a ldquoRING DCAT profilerdquo will be published
Sample dataset record
RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt
ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt
ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt
ltrdftype rdfresource=httpschemaorgDatasetgt
ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt
ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt
ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt
ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt
ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltdctpublisher rdfresource=httpringciardnetnode19510gt
ltschemapublisher rdfresource=httpringciardnetnode19510gt
ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt
ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt
ltdctspatialgtNationalltns1spatialgt
ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt
ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt
ltdctconformsTo rdfresource=httpringciardnetnode19239gt
ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt
ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt
ltdcttype rdfresource=httpringciardnettaxonomy_term81gt
ltdcatcatalog rdfresource=httpringciardnetnode19436gt
ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt
ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt
ltrdfDescriptiongt
References1 Issues and Recommendations Associated with Distributed Computation and Data
Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ
2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)
3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)
4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf
5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page
6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki
Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat
bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-
profile-data-portals-europe-final
bull DataCube httppurlorglinked-datacube
bull VOID httprdfsorgnsvoid-guide
bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml
bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology
bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables
bull CKAN httpckanorg
bull Dataverse httpdataverseorg
bull Datahub httpdatahubio
bull DataCite httpsearchdataciteorguiq=subject3Aagriculture
bull Re3data httpwwwre3dataorg
bull OpenAIRE httpswwwopenaireeu
bull CIARD RING httpringciardinfo
Dataset description and DCAT
DRTCISI-ICSUCODATAInternational Workshop on
Open Data Repositories
Thank you
Valeria Pesce
valeriapescefaoorg
General issue with all metadata
Standardization of the values eg for ldquothematic coveragerdquo or ldquodimensionsrdquo of datasets ldquoformatrdquo or ldquoprotocol usedrdquo of distributions etc
- The value should be standardized possibly a URI
- The value should be part of an authority list code list
Andhellip There is no authority ldquovalue vocabularyrdquo or code list for many of these values
No out-of-the-box solution
bull Do existing data catalog tools normally cover these metadataNO
bull Do existing metadata vocabularies (RDF and not) cover all these metadata Or do they adopt the same modelNO
BUT by using even basic metadata to describe datasets in data catalogs ldquopublishers increase discoverability and enable applications to consume metadata from multiple catalogs This further enables decentralized publishing of catalogs and facilitates federated dataset search across sitesrdquo(from W3C page on ldquoBest Practices for Publishing Linked Datardquo)
Dataset description vocabularies
Letrsquos see if the main vocabularies to describe datasets provide themetadata we think are needed
for full interoperability
Semantic interoperability
In this presentation we cover only RDF vocabularies with special focus on semantic interoperability
Dataset metadata have been managed in several ways before semantic technologies see NetCDF or HDF5 structures and various hierarchical array-based structures used especially in observations datasets ndashincluding dataset metadata at the top and data arrays below
Dataset description vocabularies
bull DCAT vocabularybull RDF vocabulary for describing any dataset
bull Datasets can be standalone or part of a ldquocatalogrdquo
bull Metadata about dataset (collection) and related distributions
bull DataCube vocabularybull RDF vocabulary for describing statistical datasets
bull Useful for attaching metadata about the ldquodata structurerdquo of a dataset
bull VOID vocabularybull RDF vocabulary for expressing metadata about RDF datasets
bull Useful especially for metadata related to RDF data services
Definition of ldquodatasetrdquo in DCAT
Definition given by the W3C Government Linked Data Working Group
A dataset is ldquoa collection of data published or curated by a single source and available for access or download in one or more formatsrdquo
The ldquoinstancesrdquo of the dataset ldquoavailable for access or download in one or more formatsrdquo are called ldquodistributionsrdquo A dataset can have many distributions
Examples of distributions include a downloadable CSV file an API or an RSS feed
c
DCAT model
1) identifier(s)2) who is responsible for it3) when and where the data were
collected4) relations to organizations persons
publications software projects fundinghellip NO
5) the conditions for re-use (rights licenses)
6) provenance version NO7) coverage of the dataset8) dimensions and semantics NO9) slices subsets NO10) URL11) Format12) Protocols parameters NO
DCAT and DCAT-AP
The DCAT Application profile for data portals in
Europe (DCAT-AP) is an extension of DCAT
It combines DCAT with the W3C Asset Description Metadata
Schema (ADMS) vocabulary plus classes and properties from
Dublin Core SKOS and Vcard in an Application profile
The elaboration of the DCAT-AP was a joint initiative of DG
CONNECT the EU Publications Office and the ISA Programme
A diagram of the full DCAT-AP specification is on the next slide
Full DCAT AP
versions
rights and provenance
standards
rights
format
relation
1) who is responsible for it MORE2) relations to organizations projects publications funding Partly
3) the conditions for re-use (rights licenses) MORE4) provenance version YES5) protocols parameters NO6) dimensions and semantics NO
More than DCAT
Limitations of DCAT
It doesnrsquot cover
bull semantic relations to organisations persons software projects fundinghellip
bull dimensions and variables and syntax semantics of dimensions and variables
bull protocols and parameters for datasets available through APIs
bull time and space slices subsets
Combining DCAT with other vocabularies
ldquoOther complementary vocabularies may be used together with DCAT to provide more detailed format-specific information For example properties from the VoID vocabulary can be used if that dataset is in RDF formatrdquo
(from the DCAT specification)
DataCube structure definition
A cube is organized according to a set of dimensions attributes and measures
bull The dimension components serve to identify the observed dimensions (eg time geographic region gender elevationhellip)
bull The measure components represent the phenomenon being observed (eg life expectancy)
bull The attribute components enable specification of the units of measure any scaling factors and metadata such as the status of the observation (eg estimated provisional)
DataCube model for dataset structure
This part of the model could be re-used for describing the dimensions of any dataset also non-statistical
1) dimensions and semantics YES2) slices subsets YES
More than DCAT-AP
VOID model
c
dctlicensewvnorms wvwaiver
3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES
More than DCAT-AP
Complementing DCAT
bull For dimensions semantics of dimensions slicingbull DataCube
bull DDI
bull For API aspectsbull VOID (linked data)
bull Web services descriptions (Hydra (WSDL WADL))
bull For relations to organizations projects publications fundinghellipbull CERIF for datasets
bull VIVO Datastar
Many vocabulariesVocabularies with relations to DCAT or same model
bull DCAT-AP and other extensions
bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT
bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)
bull Schemaorg
Other vocabulariesbull DataCite and re3data
bull CERIF for datasets
bull VIVO Datastar
bull INSPIRE
Examples of application of DCAT
bull CKAN data catalog tool(more in your workshop)
bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances
bull CIARD RING federated data catalogmanaged by GFAR
CIARD RING
Datasets in the RING dataset hub
bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)
bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)
bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)
bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary
a ldquoRING DCAT profilerdquo will be published
Sample dataset record
RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt
ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt
ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt
ltrdftype rdfresource=httpschemaorgDatasetgt
ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt
ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt
ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt
ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt
ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltdctpublisher rdfresource=httpringciardnetnode19510gt
ltschemapublisher rdfresource=httpringciardnetnode19510gt
ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt
ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt
ltdctspatialgtNationalltns1spatialgt
ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt
ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt
ltdctconformsTo rdfresource=httpringciardnetnode19239gt
ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt
ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt
ltdcttype rdfresource=httpringciardnettaxonomy_term81gt
ltdcatcatalog rdfresource=httpringciardnetnode19436gt
ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt
ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt
ltrdfDescriptiongt
References1 Issues and Recommendations Associated with Distributed Computation and Data
Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ
2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)
3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)
4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf
5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page
6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki
Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat
bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-
profile-data-portals-europe-final
bull DataCube httppurlorglinked-datacube
bull VOID httprdfsorgnsvoid-guide
bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml
bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology
bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables
bull CKAN httpckanorg
bull Dataverse httpdataverseorg
bull Datahub httpdatahubio
bull DataCite httpsearchdataciteorguiq=subject3Aagriculture
bull Re3data httpwwwre3dataorg
bull OpenAIRE httpswwwopenaireeu
bull CIARD RING httpringciardinfo
Dataset description and DCAT
DRTCISI-ICSUCODATAInternational Workshop on
Open Data Repositories
Thank you
Valeria Pesce
valeriapescefaoorg
No out-of-the-box solution
bull Do existing data catalog tools normally cover these metadataNO
bull Do existing metadata vocabularies (RDF and not) cover all these metadata Or do they adopt the same modelNO
BUT by using even basic metadata to describe datasets in data catalogs ldquopublishers increase discoverability and enable applications to consume metadata from multiple catalogs This further enables decentralized publishing of catalogs and facilitates federated dataset search across sitesrdquo(from W3C page on ldquoBest Practices for Publishing Linked Datardquo)
Dataset description vocabularies
Letrsquos see if the main vocabularies to describe datasets provide themetadata we think are needed
for full interoperability
Semantic interoperability
In this presentation we cover only RDF vocabularies with special focus on semantic interoperability
Dataset metadata have been managed in several ways before semantic technologies see NetCDF or HDF5 structures and various hierarchical array-based structures used especially in observations datasets ndashincluding dataset metadata at the top and data arrays below
Dataset description vocabularies
bull DCAT vocabularybull RDF vocabulary for describing any dataset
bull Datasets can be standalone or part of a ldquocatalogrdquo
bull Metadata about dataset (collection) and related distributions
bull DataCube vocabularybull RDF vocabulary for describing statistical datasets
bull Useful for attaching metadata about the ldquodata structurerdquo of a dataset
bull VOID vocabularybull RDF vocabulary for expressing metadata about RDF datasets
bull Useful especially for metadata related to RDF data services
Definition of ldquodatasetrdquo in DCAT
Definition given by the W3C Government Linked Data Working Group
A dataset is ldquoa collection of data published or curated by a single source and available for access or download in one or more formatsrdquo
The ldquoinstancesrdquo of the dataset ldquoavailable for access or download in one or more formatsrdquo are called ldquodistributionsrdquo A dataset can have many distributions
Examples of distributions include a downloadable CSV file an API or an RSS feed
c
DCAT model
1) identifier(s)2) who is responsible for it3) when and where the data were
collected4) relations to organizations persons
publications software projects fundinghellip NO
5) the conditions for re-use (rights licenses)
6) provenance version NO7) coverage of the dataset8) dimensions and semantics NO9) slices subsets NO10) URL11) Format12) Protocols parameters NO
DCAT and DCAT-AP
The DCAT Application profile for data portals in
Europe (DCAT-AP) is an extension of DCAT
It combines DCAT with the W3C Asset Description Metadata
Schema (ADMS) vocabulary plus classes and properties from
Dublin Core SKOS and Vcard in an Application profile
The elaboration of the DCAT-AP was a joint initiative of DG
CONNECT the EU Publications Office and the ISA Programme
A diagram of the full DCAT-AP specification is on the next slide
Full DCAT AP
versions
rights and provenance
standards
rights
format
relation
1) who is responsible for it MORE2) relations to organizations projects publications funding Partly
3) the conditions for re-use (rights licenses) MORE4) provenance version YES5) protocols parameters NO6) dimensions and semantics NO
More than DCAT
Limitations of DCAT
It doesnrsquot cover
bull semantic relations to organisations persons software projects fundinghellip
bull dimensions and variables and syntax semantics of dimensions and variables
bull protocols and parameters for datasets available through APIs
bull time and space slices subsets
Combining DCAT with other vocabularies
ldquoOther complementary vocabularies may be used together with DCAT to provide more detailed format-specific information For example properties from the VoID vocabulary can be used if that dataset is in RDF formatrdquo
(from the DCAT specification)
DataCube structure definition
A cube is organized according to a set of dimensions attributes and measures
bull The dimension components serve to identify the observed dimensions (eg time geographic region gender elevationhellip)
bull The measure components represent the phenomenon being observed (eg life expectancy)
bull The attribute components enable specification of the units of measure any scaling factors and metadata such as the status of the observation (eg estimated provisional)
DataCube model for dataset structure
This part of the model could be re-used for describing the dimensions of any dataset also non-statistical
1) dimensions and semantics YES2) slices subsets YES
More than DCAT-AP
VOID model
c
dctlicensewvnorms wvwaiver
3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES
More than DCAT-AP
Complementing DCAT
bull For dimensions semantics of dimensions slicingbull DataCube
bull DDI
bull For API aspectsbull VOID (linked data)
bull Web services descriptions (Hydra (WSDL WADL))
bull For relations to organizations projects publications fundinghellipbull CERIF for datasets
bull VIVO Datastar
Many vocabulariesVocabularies with relations to DCAT or same model
bull DCAT-AP and other extensions
bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT
bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)
bull Schemaorg
Other vocabulariesbull DataCite and re3data
bull CERIF for datasets
bull VIVO Datastar
bull INSPIRE
Examples of application of DCAT
bull CKAN data catalog tool(more in your workshop)
bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances
bull CIARD RING federated data catalogmanaged by GFAR
CIARD RING
Datasets in the RING dataset hub
bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)
bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)
bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)
bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary
a ldquoRING DCAT profilerdquo will be published
Sample dataset record
RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt
ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt
ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt
ltrdftype rdfresource=httpschemaorgDatasetgt
ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt
ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt
ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt
ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt
ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltdctpublisher rdfresource=httpringciardnetnode19510gt
ltschemapublisher rdfresource=httpringciardnetnode19510gt
ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt
ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt
ltdctspatialgtNationalltns1spatialgt
ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt
ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt
ltdctconformsTo rdfresource=httpringciardnetnode19239gt
ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt
ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt
ltdcttype rdfresource=httpringciardnettaxonomy_term81gt
ltdcatcatalog rdfresource=httpringciardnetnode19436gt
ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt
ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt
ltrdfDescriptiongt
References1 Issues and Recommendations Associated with Distributed Computation and Data
Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ
2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)
3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)
4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf
5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page
6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki
Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat
bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-
profile-data-portals-europe-final
bull DataCube httppurlorglinked-datacube
bull VOID httprdfsorgnsvoid-guide
bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml
bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology
bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables
bull CKAN httpckanorg
bull Dataverse httpdataverseorg
bull Datahub httpdatahubio
bull DataCite httpsearchdataciteorguiq=subject3Aagriculture
bull Re3data httpwwwre3dataorg
bull OpenAIRE httpswwwopenaireeu
bull CIARD RING httpringciardinfo
Dataset description and DCAT
DRTCISI-ICSUCODATAInternational Workshop on
Open Data Repositories
Thank you
Valeria Pesce
valeriapescefaoorg
Dataset description vocabularies
Letrsquos see if the main vocabularies to describe datasets provide themetadata we think are needed
for full interoperability
Semantic interoperability
In this presentation we cover only RDF vocabularies with special focus on semantic interoperability
Dataset metadata have been managed in several ways before semantic technologies see NetCDF or HDF5 structures and various hierarchical array-based structures used especially in observations datasets ndashincluding dataset metadata at the top and data arrays below
Dataset description vocabularies
bull DCAT vocabularybull RDF vocabulary for describing any dataset
bull Datasets can be standalone or part of a ldquocatalogrdquo
bull Metadata about dataset (collection) and related distributions
bull DataCube vocabularybull RDF vocabulary for describing statistical datasets
bull Useful for attaching metadata about the ldquodata structurerdquo of a dataset
bull VOID vocabularybull RDF vocabulary for expressing metadata about RDF datasets
bull Useful especially for metadata related to RDF data services
Definition of ldquodatasetrdquo in DCAT
Definition given by the W3C Government Linked Data Working Group
A dataset is ldquoa collection of data published or curated by a single source and available for access or download in one or more formatsrdquo
The ldquoinstancesrdquo of the dataset ldquoavailable for access or download in one or more formatsrdquo are called ldquodistributionsrdquo A dataset can have many distributions
Examples of distributions include a downloadable CSV file an API or an RSS feed
c
DCAT model
1) identifier(s)2) who is responsible for it3) when and where the data were
collected4) relations to organizations persons
publications software projects fundinghellip NO
5) the conditions for re-use (rights licenses)
6) provenance version NO7) coverage of the dataset8) dimensions and semantics NO9) slices subsets NO10) URL11) Format12) Protocols parameters NO
DCAT and DCAT-AP
The DCAT Application profile for data portals in
Europe (DCAT-AP) is an extension of DCAT
It combines DCAT with the W3C Asset Description Metadata
Schema (ADMS) vocabulary plus classes and properties from
Dublin Core SKOS and Vcard in an Application profile
The elaboration of the DCAT-AP was a joint initiative of DG
CONNECT the EU Publications Office and the ISA Programme
A diagram of the full DCAT-AP specification is on the next slide
Full DCAT AP
versions
rights and provenance
standards
rights
format
relation
1) who is responsible for it MORE2) relations to organizations projects publications funding Partly
3) the conditions for re-use (rights licenses) MORE4) provenance version YES5) protocols parameters NO6) dimensions and semantics NO
More than DCAT
Limitations of DCAT
It doesnrsquot cover
bull semantic relations to organisations persons software projects fundinghellip
bull dimensions and variables and syntax semantics of dimensions and variables
bull protocols and parameters for datasets available through APIs
bull time and space slices subsets
Combining DCAT with other vocabularies
ldquoOther complementary vocabularies may be used together with DCAT to provide more detailed format-specific information For example properties from the VoID vocabulary can be used if that dataset is in RDF formatrdquo
(from the DCAT specification)
DataCube structure definition
A cube is organized according to a set of dimensions attributes and measures
bull The dimension components serve to identify the observed dimensions (eg time geographic region gender elevationhellip)
bull The measure components represent the phenomenon being observed (eg life expectancy)
bull The attribute components enable specification of the units of measure any scaling factors and metadata such as the status of the observation (eg estimated provisional)
DataCube model for dataset structure
This part of the model could be re-used for describing the dimensions of any dataset also non-statistical
1) dimensions and semantics YES2) slices subsets YES
More than DCAT-AP
VOID model
c
dctlicensewvnorms wvwaiver
3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES
More than DCAT-AP
Complementing DCAT
bull For dimensions semantics of dimensions slicingbull DataCube
bull DDI
bull For API aspectsbull VOID (linked data)
bull Web services descriptions (Hydra (WSDL WADL))
bull For relations to organizations projects publications fundinghellipbull CERIF for datasets
bull VIVO Datastar
Many vocabulariesVocabularies with relations to DCAT or same model
bull DCAT-AP and other extensions
bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT
bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)
bull Schemaorg
Other vocabulariesbull DataCite and re3data
bull CERIF for datasets
bull VIVO Datastar
bull INSPIRE
Examples of application of DCAT
bull CKAN data catalog tool(more in your workshop)
bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances
bull CIARD RING federated data catalogmanaged by GFAR
CIARD RING
Datasets in the RING dataset hub
bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)
bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)
bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)
bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary
a ldquoRING DCAT profilerdquo will be published
Sample dataset record
RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt
ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt
ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt
ltrdftype rdfresource=httpschemaorgDatasetgt
ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt
ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt
ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt
ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt
ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltdctpublisher rdfresource=httpringciardnetnode19510gt
ltschemapublisher rdfresource=httpringciardnetnode19510gt
ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt
ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt
ltdctspatialgtNationalltns1spatialgt
ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt
ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt
ltdctconformsTo rdfresource=httpringciardnetnode19239gt
ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt
ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt
ltdcttype rdfresource=httpringciardnettaxonomy_term81gt
ltdcatcatalog rdfresource=httpringciardnetnode19436gt
ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt
ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt
ltrdfDescriptiongt
References1 Issues and Recommendations Associated with Distributed Computation and Data
Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ
2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)
3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)
4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf
5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page
6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki
Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat
bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-
profile-data-portals-europe-final
bull DataCube httppurlorglinked-datacube
bull VOID httprdfsorgnsvoid-guide
bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml
bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology
bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables
bull CKAN httpckanorg
bull Dataverse httpdataverseorg
bull Datahub httpdatahubio
bull DataCite httpsearchdataciteorguiq=subject3Aagriculture
bull Re3data httpwwwre3dataorg
bull OpenAIRE httpswwwopenaireeu
bull CIARD RING httpringciardinfo
Dataset description and DCAT
DRTCISI-ICSUCODATAInternational Workshop on
Open Data Repositories
Thank you
Valeria Pesce
valeriapescefaoorg
Semantic interoperability
In this presentation we cover only RDF vocabularies with special focus on semantic interoperability
Dataset metadata have been managed in several ways before semantic technologies see NetCDF or HDF5 structures and various hierarchical array-based structures used especially in observations datasets ndashincluding dataset metadata at the top and data arrays below
Dataset description vocabularies
bull DCAT vocabularybull RDF vocabulary for describing any dataset
bull Datasets can be standalone or part of a ldquocatalogrdquo
bull Metadata about dataset (collection) and related distributions
bull DataCube vocabularybull RDF vocabulary for describing statistical datasets
bull Useful for attaching metadata about the ldquodata structurerdquo of a dataset
bull VOID vocabularybull RDF vocabulary for expressing metadata about RDF datasets
bull Useful especially for metadata related to RDF data services
Definition of ldquodatasetrdquo in DCAT
Definition given by the W3C Government Linked Data Working Group
A dataset is ldquoa collection of data published or curated by a single source and available for access or download in one or more formatsrdquo
The ldquoinstancesrdquo of the dataset ldquoavailable for access or download in one or more formatsrdquo are called ldquodistributionsrdquo A dataset can have many distributions
Examples of distributions include a downloadable CSV file an API or an RSS feed
c
DCAT model
1) identifier(s)2) who is responsible for it3) when and where the data were
collected4) relations to organizations persons
publications software projects fundinghellip NO
5) the conditions for re-use (rights licenses)
6) provenance version NO7) coverage of the dataset8) dimensions and semantics NO9) slices subsets NO10) URL11) Format12) Protocols parameters NO
DCAT and DCAT-AP
The DCAT Application profile for data portals in
Europe (DCAT-AP) is an extension of DCAT
It combines DCAT with the W3C Asset Description Metadata
Schema (ADMS) vocabulary plus classes and properties from
Dublin Core SKOS and Vcard in an Application profile
The elaboration of the DCAT-AP was a joint initiative of DG
CONNECT the EU Publications Office and the ISA Programme
A diagram of the full DCAT-AP specification is on the next slide
Full DCAT AP
versions
rights and provenance
standards
rights
format
relation
1) who is responsible for it MORE2) relations to organizations projects publications funding Partly
3) the conditions for re-use (rights licenses) MORE4) provenance version YES5) protocols parameters NO6) dimensions and semantics NO
More than DCAT
Limitations of DCAT
It doesnrsquot cover
bull semantic relations to organisations persons software projects fundinghellip
bull dimensions and variables and syntax semantics of dimensions and variables
bull protocols and parameters for datasets available through APIs
bull time and space slices subsets
Combining DCAT with other vocabularies
ldquoOther complementary vocabularies may be used together with DCAT to provide more detailed format-specific information For example properties from the VoID vocabulary can be used if that dataset is in RDF formatrdquo
(from the DCAT specification)
DataCube structure definition
A cube is organized according to a set of dimensions attributes and measures
bull The dimension components serve to identify the observed dimensions (eg time geographic region gender elevationhellip)
bull The measure components represent the phenomenon being observed (eg life expectancy)
bull The attribute components enable specification of the units of measure any scaling factors and metadata such as the status of the observation (eg estimated provisional)
DataCube model for dataset structure
This part of the model could be re-used for describing the dimensions of any dataset also non-statistical
1) dimensions and semantics YES2) slices subsets YES
More than DCAT-AP
VOID model
c
dctlicensewvnorms wvwaiver
3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES
More than DCAT-AP
Complementing DCAT
bull For dimensions semantics of dimensions slicingbull DataCube
bull DDI
bull For API aspectsbull VOID (linked data)
bull Web services descriptions (Hydra (WSDL WADL))
bull For relations to organizations projects publications fundinghellipbull CERIF for datasets
bull VIVO Datastar
Many vocabulariesVocabularies with relations to DCAT or same model
bull DCAT-AP and other extensions
bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT
bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)
bull Schemaorg
Other vocabulariesbull DataCite and re3data
bull CERIF for datasets
bull VIVO Datastar
bull INSPIRE
Examples of application of DCAT
bull CKAN data catalog tool(more in your workshop)
bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances
bull CIARD RING federated data catalogmanaged by GFAR
CIARD RING
Datasets in the RING dataset hub
bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)
bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)
bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)
bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary
a ldquoRING DCAT profilerdquo will be published
Sample dataset record
RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt
ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt
ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt
ltrdftype rdfresource=httpschemaorgDatasetgt
ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt
ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt
ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt
ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt
ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltdctpublisher rdfresource=httpringciardnetnode19510gt
ltschemapublisher rdfresource=httpringciardnetnode19510gt
ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt
ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt
ltdctspatialgtNationalltns1spatialgt
ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt
ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt
ltdctconformsTo rdfresource=httpringciardnetnode19239gt
ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt
ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt
ltdcttype rdfresource=httpringciardnettaxonomy_term81gt
ltdcatcatalog rdfresource=httpringciardnetnode19436gt
ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt
ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt
ltrdfDescriptiongt
References1 Issues and Recommendations Associated with Distributed Computation and Data
Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ
2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)
3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)
4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf
5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page
6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki
Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat
bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-
profile-data-portals-europe-final
bull DataCube httppurlorglinked-datacube
bull VOID httprdfsorgnsvoid-guide
bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml
bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology
bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables
bull CKAN httpckanorg
bull Dataverse httpdataverseorg
bull Datahub httpdatahubio
bull DataCite httpsearchdataciteorguiq=subject3Aagriculture
bull Re3data httpwwwre3dataorg
bull OpenAIRE httpswwwopenaireeu
bull CIARD RING httpringciardinfo
Dataset description and DCAT
DRTCISI-ICSUCODATAInternational Workshop on
Open Data Repositories
Thank you
Valeria Pesce
valeriapescefaoorg
Dataset description vocabularies
bull DCAT vocabularybull RDF vocabulary for describing any dataset
bull Datasets can be standalone or part of a ldquocatalogrdquo
bull Metadata about dataset (collection) and related distributions
bull DataCube vocabularybull RDF vocabulary for describing statistical datasets
bull Useful for attaching metadata about the ldquodata structurerdquo of a dataset
bull VOID vocabularybull RDF vocabulary for expressing metadata about RDF datasets
bull Useful especially for metadata related to RDF data services
Definition of ldquodatasetrdquo in DCAT
Definition given by the W3C Government Linked Data Working Group
A dataset is ldquoa collection of data published or curated by a single source and available for access or download in one or more formatsrdquo
The ldquoinstancesrdquo of the dataset ldquoavailable for access or download in one or more formatsrdquo are called ldquodistributionsrdquo A dataset can have many distributions
Examples of distributions include a downloadable CSV file an API or an RSS feed
c
DCAT model
1) identifier(s)2) who is responsible for it3) when and where the data were
collected4) relations to organizations persons
publications software projects fundinghellip NO
5) the conditions for re-use (rights licenses)
6) provenance version NO7) coverage of the dataset8) dimensions and semantics NO9) slices subsets NO10) URL11) Format12) Protocols parameters NO
DCAT and DCAT-AP
The DCAT Application profile for data portals in
Europe (DCAT-AP) is an extension of DCAT
It combines DCAT with the W3C Asset Description Metadata
Schema (ADMS) vocabulary plus classes and properties from
Dublin Core SKOS and Vcard in an Application profile
The elaboration of the DCAT-AP was a joint initiative of DG
CONNECT the EU Publications Office and the ISA Programme
A diagram of the full DCAT-AP specification is on the next slide
Full DCAT AP
versions
rights and provenance
standards
rights
format
relation
1) who is responsible for it MORE2) relations to organizations projects publications funding Partly
3) the conditions for re-use (rights licenses) MORE4) provenance version YES5) protocols parameters NO6) dimensions and semantics NO
More than DCAT
Limitations of DCAT
It doesnrsquot cover
bull semantic relations to organisations persons software projects fundinghellip
bull dimensions and variables and syntax semantics of dimensions and variables
bull protocols and parameters for datasets available through APIs
bull time and space slices subsets
Combining DCAT with other vocabularies
ldquoOther complementary vocabularies may be used together with DCAT to provide more detailed format-specific information For example properties from the VoID vocabulary can be used if that dataset is in RDF formatrdquo
(from the DCAT specification)
DataCube structure definition
A cube is organized according to a set of dimensions attributes and measures
bull The dimension components serve to identify the observed dimensions (eg time geographic region gender elevationhellip)
bull The measure components represent the phenomenon being observed (eg life expectancy)
bull The attribute components enable specification of the units of measure any scaling factors and metadata such as the status of the observation (eg estimated provisional)
DataCube model for dataset structure
This part of the model could be re-used for describing the dimensions of any dataset also non-statistical
1) dimensions and semantics YES2) slices subsets YES
More than DCAT-AP
VOID model
c
dctlicensewvnorms wvwaiver
3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES
More than DCAT-AP
Complementing DCAT
bull For dimensions semantics of dimensions slicingbull DataCube
bull DDI
bull For API aspectsbull VOID (linked data)
bull Web services descriptions (Hydra (WSDL WADL))
bull For relations to organizations projects publications fundinghellipbull CERIF for datasets
bull VIVO Datastar
Many vocabulariesVocabularies with relations to DCAT or same model
bull DCAT-AP and other extensions
bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT
bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)
bull Schemaorg
Other vocabulariesbull DataCite and re3data
bull CERIF for datasets
bull VIVO Datastar
bull INSPIRE
Examples of application of DCAT
bull CKAN data catalog tool(more in your workshop)
bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances
bull CIARD RING federated data catalogmanaged by GFAR
CIARD RING
Datasets in the RING dataset hub
bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)
bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)
bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)
bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary
a ldquoRING DCAT profilerdquo will be published
Sample dataset record
RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt
ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt
ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt
ltrdftype rdfresource=httpschemaorgDatasetgt
ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt
ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt
ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt
ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt
ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltdctpublisher rdfresource=httpringciardnetnode19510gt
ltschemapublisher rdfresource=httpringciardnetnode19510gt
ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt
ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt
ltdctspatialgtNationalltns1spatialgt
ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt
ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt
ltdctconformsTo rdfresource=httpringciardnetnode19239gt
ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt
ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt
ltdcttype rdfresource=httpringciardnettaxonomy_term81gt
ltdcatcatalog rdfresource=httpringciardnetnode19436gt
ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt
ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt
ltrdfDescriptiongt
References1 Issues and Recommendations Associated with Distributed Computation and Data
Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ
2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)
3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)
4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf
5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page
6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki
Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat
bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-
profile-data-portals-europe-final
bull DataCube httppurlorglinked-datacube
bull VOID httprdfsorgnsvoid-guide
bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml
bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology
bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables
bull CKAN httpckanorg
bull Dataverse httpdataverseorg
bull Datahub httpdatahubio
bull DataCite httpsearchdataciteorguiq=subject3Aagriculture
bull Re3data httpwwwre3dataorg
bull OpenAIRE httpswwwopenaireeu
bull CIARD RING httpringciardinfo
Dataset description and DCAT
DRTCISI-ICSUCODATAInternational Workshop on
Open Data Repositories
Thank you
Valeria Pesce
valeriapescefaoorg
Definition of ldquodatasetrdquo in DCAT
Definition given by the W3C Government Linked Data Working Group
A dataset is ldquoa collection of data published or curated by a single source and available for access or download in one or more formatsrdquo
The ldquoinstancesrdquo of the dataset ldquoavailable for access or download in one or more formatsrdquo are called ldquodistributionsrdquo A dataset can have many distributions
Examples of distributions include a downloadable CSV file an API or an RSS feed
c
DCAT model
1) identifier(s)2) who is responsible for it3) when and where the data were
collected4) relations to organizations persons
publications software projects fundinghellip NO
5) the conditions for re-use (rights licenses)
6) provenance version NO7) coverage of the dataset8) dimensions and semantics NO9) slices subsets NO10) URL11) Format12) Protocols parameters NO
DCAT and DCAT-AP
The DCAT Application profile for data portals in
Europe (DCAT-AP) is an extension of DCAT
It combines DCAT with the W3C Asset Description Metadata
Schema (ADMS) vocabulary plus classes and properties from
Dublin Core SKOS and Vcard in an Application profile
The elaboration of the DCAT-AP was a joint initiative of DG
CONNECT the EU Publications Office and the ISA Programme
A diagram of the full DCAT-AP specification is on the next slide
Full DCAT AP
versions
rights and provenance
standards
rights
format
relation
1) who is responsible for it MORE2) relations to organizations projects publications funding Partly
3) the conditions for re-use (rights licenses) MORE4) provenance version YES5) protocols parameters NO6) dimensions and semantics NO
More than DCAT
Limitations of DCAT
It doesnrsquot cover
bull semantic relations to organisations persons software projects fundinghellip
bull dimensions and variables and syntax semantics of dimensions and variables
bull protocols and parameters for datasets available through APIs
bull time and space slices subsets
Combining DCAT with other vocabularies
ldquoOther complementary vocabularies may be used together with DCAT to provide more detailed format-specific information For example properties from the VoID vocabulary can be used if that dataset is in RDF formatrdquo
(from the DCAT specification)
DataCube structure definition
A cube is organized according to a set of dimensions attributes and measures
bull The dimension components serve to identify the observed dimensions (eg time geographic region gender elevationhellip)
bull The measure components represent the phenomenon being observed (eg life expectancy)
bull The attribute components enable specification of the units of measure any scaling factors and metadata such as the status of the observation (eg estimated provisional)
DataCube model for dataset structure
This part of the model could be re-used for describing the dimensions of any dataset also non-statistical
1) dimensions and semantics YES2) slices subsets YES
More than DCAT-AP
VOID model
c
dctlicensewvnorms wvwaiver
3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES
More than DCAT-AP
Complementing DCAT
bull For dimensions semantics of dimensions slicingbull DataCube
bull DDI
bull For API aspectsbull VOID (linked data)
bull Web services descriptions (Hydra (WSDL WADL))
bull For relations to organizations projects publications fundinghellipbull CERIF for datasets
bull VIVO Datastar
Many vocabulariesVocabularies with relations to DCAT or same model
bull DCAT-AP and other extensions
bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT
bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)
bull Schemaorg
Other vocabulariesbull DataCite and re3data
bull CERIF for datasets
bull VIVO Datastar
bull INSPIRE
Examples of application of DCAT
bull CKAN data catalog tool(more in your workshop)
bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances
bull CIARD RING federated data catalogmanaged by GFAR
CIARD RING
Datasets in the RING dataset hub
bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)
bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)
bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)
bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary
a ldquoRING DCAT profilerdquo will be published
Sample dataset record
RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt
ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt
ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt
ltrdftype rdfresource=httpschemaorgDatasetgt
ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt
ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt
ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt
ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt
ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltdctpublisher rdfresource=httpringciardnetnode19510gt
ltschemapublisher rdfresource=httpringciardnetnode19510gt
ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt
ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt
ltdctspatialgtNationalltns1spatialgt
ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt
ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt
ltdctconformsTo rdfresource=httpringciardnetnode19239gt
ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt
ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt
ltdcttype rdfresource=httpringciardnettaxonomy_term81gt
ltdcatcatalog rdfresource=httpringciardnetnode19436gt
ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt
ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt
ltrdfDescriptiongt
References1 Issues and Recommendations Associated with Distributed Computation and Data
Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ
2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)
3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)
4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf
5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page
6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki
Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat
bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-
profile-data-portals-europe-final
bull DataCube httppurlorglinked-datacube
bull VOID httprdfsorgnsvoid-guide
bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml
bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology
bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables
bull CKAN httpckanorg
bull Dataverse httpdataverseorg
bull Datahub httpdatahubio
bull DataCite httpsearchdataciteorguiq=subject3Aagriculture
bull Re3data httpwwwre3dataorg
bull OpenAIRE httpswwwopenaireeu
bull CIARD RING httpringciardinfo
Dataset description and DCAT
DRTCISI-ICSUCODATAInternational Workshop on
Open Data Repositories
Thank you
Valeria Pesce
valeriapescefaoorg
c
DCAT model
1) identifier(s)2) who is responsible for it3) when and where the data were
collected4) relations to organizations persons
publications software projects fundinghellip NO
5) the conditions for re-use (rights licenses)
6) provenance version NO7) coverage of the dataset8) dimensions and semantics NO9) slices subsets NO10) URL11) Format12) Protocols parameters NO
DCAT and DCAT-AP
The DCAT Application profile for data portals in
Europe (DCAT-AP) is an extension of DCAT
It combines DCAT with the W3C Asset Description Metadata
Schema (ADMS) vocabulary plus classes and properties from
Dublin Core SKOS and Vcard in an Application profile
The elaboration of the DCAT-AP was a joint initiative of DG
CONNECT the EU Publications Office and the ISA Programme
A diagram of the full DCAT-AP specification is on the next slide
Full DCAT AP
versions
rights and provenance
standards
rights
format
relation
1) who is responsible for it MORE2) relations to organizations projects publications funding Partly
3) the conditions for re-use (rights licenses) MORE4) provenance version YES5) protocols parameters NO6) dimensions and semantics NO
More than DCAT
Limitations of DCAT
It doesnrsquot cover
bull semantic relations to organisations persons software projects fundinghellip
bull dimensions and variables and syntax semantics of dimensions and variables
bull protocols and parameters for datasets available through APIs
bull time and space slices subsets
Combining DCAT with other vocabularies
ldquoOther complementary vocabularies may be used together with DCAT to provide more detailed format-specific information For example properties from the VoID vocabulary can be used if that dataset is in RDF formatrdquo
(from the DCAT specification)
DataCube structure definition
A cube is organized according to a set of dimensions attributes and measures
bull The dimension components serve to identify the observed dimensions (eg time geographic region gender elevationhellip)
bull The measure components represent the phenomenon being observed (eg life expectancy)
bull The attribute components enable specification of the units of measure any scaling factors and metadata such as the status of the observation (eg estimated provisional)
DataCube model for dataset structure
This part of the model could be re-used for describing the dimensions of any dataset also non-statistical
1) dimensions and semantics YES2) slices subsets YES
More than DCAT-AP
VOID model
c
dctlicensewvnorms wvwaiver
3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES
More than DCAT-AP
Complementing DCAT
bull For dimensions semantics of dimensions slicingbull DataCube
bull DDI
bull For API aspectsbull VOID (linked data)
bull Web services descriptions (Hydra (WSDL WADL))
bull For relations to organizations projects publications fundinghellipbull CERIF for datasets
bull VIVO Datastar
Many vocabulariesVocabularies with relations to DCAT or same model
bull DCAT-AP and other extensions
bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT
bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)
bull Schemaorg
Other vocabulariesbull DataCite and re3data
bull CERIF for datasets
bull VIVO Datastar
bull INSPIRE
Examples of application of DCAT
bull CKAN data catalog tool(more in your workshop)
bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances
bull CIARD RING federated data catalogmanaged by GFAR
CIARD RING
Datasets in the RING dataset hub
bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)
bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)
bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)
bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary
a ldquoRING DCAT profilerdquo will be published
Sample dataset record
RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt
ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt
ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt
ltrdftype rdfresource=httpschemaorgDatasetgt
ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt
ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt
ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt
ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt
ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltdctpublisher rdfresource=httpringciardnetnode19510gt
ltschemapublisher rdfresource=httpringciardnetnode19510gt
ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt
ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt
ltdctspatialgtNationalltns1spatialgt
ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt
ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt
ltdctconformsTo rdfresource=httpringciardnetnode19239gt
ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt
ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt
ltdcttype rdfresource=httpringciardnettaxonomy_term81gt
ltdcatcatalog rdfresource=httpringciardnetnode19436gt
ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt
ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt
ltrdfDescriptiongt
References1 Issues and Recommendations Associated with Distributed Computation and Data
Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ
2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)
3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)
4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf
5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page
6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki
Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat
bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-
profile-data-portals-europe-final
bull DataCube httppurlorglinked-datacube
bull VOID httprdfsorgnsvoid-guide
bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml
bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology
bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables
bull CKAN httpckanorg
bull Dataverse httpdataverseorg
bull Datahub httpdatahubio
bull DataCite httpsearchdataciteorguiq=subject3Aagriculture
bull Re3data httpwwwre3dataorg
bull OpenAIRE httpswwwopenaireeu
bull CIARD RING httpringciardinfo
Dataset description and DCAT
DRTCISI-ICSUCODATAInternational Workshop on
Open Data Repositories
Thank you
Valeria Pesce
valeriapescefaoorg
DCAT and DCAT-AP
The DCAT Application profile for data portals in
Europe (DCAT-AP) is an extension of DCAT
It combines DCAT with the W3C Asset Description Metadata
Schema (ADMS) vocabulary plus classes and properties from
Dublin Core SKOS and Vcard in an Application profile
The elaboration of the DCAT-AP was a joint initiative of DG
CONNECT the EU Publications Office and the ISA Programme
A diagram of the full DCAT-AP specification is on the next slide
Full DCAT AP
versions
rights and provenance
standards
rights
format
relation
1) who is responsible for it MORE2) relations to organizations projects publications funding Partly
3) the conditions for re-use (rights licenses) MORE4) provenance version YES5) protocols parameters NO6) dimensions and semantics NO
More than DCAT
Limitations of DCAT
It doesnrsquot cover
bull semantic relations to organisations persons software projects fundinghellip
bull dimensions and variables and syntax semantics of dimensions and variables
bull protocols and parameters for datasets available through APIs
bull time and space slices subsets
Combining DCAT with other vocabularies
ldquoOther complementary vocabularies may be used together with DCAT to provide more detailed format-specific information For example properties from the VoID vocabulary can be used if that dataset is in RDF formatrdquo
(from the DCAT specification)
DataCube structure definition
A cube is organized according to a set of dimensions attributes and measures
bull The dimension components serve to identify the observed dimensions (eg time geographic region gender elevationhellip)
bull The measure components represent the phenomenon being observed (eg life expectancy)
bull The attribute components enable specification of the units of measure any scaling factors and metadata such as the status of the observation (eg estimated provisional)
DataCube model for dataset structure
This part of the model could be re-used for describing the dimensions of any dataset also non-statistical
1) dimensions and semantics YES2) slices subsets YES
More than DCAT-AP
VOID model
c
dctlicensewvnorms wvwaiver
3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES
More than DCAT-AP
Complementing DCAT
bull For dimensions semantics of dimensions slicingbull DataCube
bull DDI
bull For API aspectsbull VOID (linked data)
bull Web services descriptions (Hydra (WSDL WADL))
bull For relations to organizations projects publications fundinghellipbull CERIF for datasets
bull VIVO Datastar
Many vocabulariesVocabularies with relations to DCAT or same model
bull DCAT-AP and other extensions
bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT
bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)
bull Schemaorg
Other vocabulariesbull DataCite and re3data
bull CERIF for datasets
bull VIVO Datastar
bull INSPIRE
Examples of application of DCAT
bull CKAN data catalog tool(more in your workshop)
bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances
bull CIARD RING federated data catalogmanaged by GFAR
CIARD RING
Datasets in the RING dataset hub
bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)
bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)
bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)
bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary
a ldquoRING DCAT profilerdquo will be published
Sample dataset record
RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt
ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt
ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt
ltrdftype rdfresource=httpschemaorgDatasetgt
ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt
ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt
ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt
ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt
ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltdctpublisher rdfresource=httpringciardnetnode19510gt
ltschemapublisher rdfresource=httpringciardnetnode19510gt
ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt
ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt
ltdctspatialgtNationalltns1spatialgt
ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt
ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt
ltdctconformsTo rdfresource=httpringciardnetnode19239gt
ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt
ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt
ltdcttype rdfresource=httpringciardnettaxonomy_term81gt
ltdcatcatalog rdfresource=httpringciardnetnode19436gt
ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt
ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt
ltrdfDescriptiongt
References1 Issues and Recommendations Associated with Distributed Computation and Data
Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ
2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)
3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)
4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf
5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page
6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki
Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat
bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-
profile-data-portals-europe-final
bull DataCube httppurlorglinked-datacube
bull VOID httprdfsorgnsvoid-guide
bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml
bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology
bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables
bull CKAN httpckanorg
bull Dataverse httpdataverseorg
bull Datahub httpdatahubio
bull DataCite httpsearchdataciteorguiq=subject3Aagriculture
bull Re3data httpwwwre3dataorg
bull OpenAIRE httpswwwopenaireeu
bull CIARD RING httpringciardinfo
Dataset description and DCAT
DRTCISI-ICSUCODATAInternational Workshop on
Open Data Repositories
Thank you
Valeria Pesce
valeriapescefaoorg
Full DCAT AP
versions
rights and provenance
standards
rights
format
relation
1) who is responsible for it MORE2) relations to organizations projects publications funding Partly
3) the conditions for re-use (rights licenses) MORE4) provenance version YES5) protocols parameters NO6) dimensions and semantics NO
More than DCAT
Limitations of DCAT
It doesnrsquot cover
bull semantic relations to organisations persons software projects fundinghellip
bull dimensions and variables and syntax semantics of dimensions and variables
bull protocols and parameters for datasets available through APIs
bull time and space slices subsets
Combining DCAT with other vocabularies
ldquoOther complementary vocabularies may be used together with DCAT to provide more detailed format-specific information For example properties from the VoID vocabulary can be used if that dataset is in RDF formatrdquo
(from the DCAT specification)
DataCube structure definition
A cube is organized according to a set of dimensions attributes and measures
bull The dimension components serve to identify the observed dimensions (eg time geographic region gender elevationhellip)
bull The measure components represent the phenomenon being observed (eg life expectancy)
bull The attribute components enable specification of the units of measure any scaling factors and metadata such as the status of the observation (eg estimated provisional)
DataCube model for dataset structure
This part of the model could be re-used for describing the dimensions of any dataset also non-statistical
1) dimensions and semantics YES2) slices subsets YES
More than DCAT-AP
VOID model
c
dctlicensewvnorms wvwaiver
3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES
More than DCAT-AP
Complementing DCAT
bull For dimensions semantics of dimensions slicingbull DataCube
bull DDI
bull For API aspectsbull VOID (linked data)
bull Web services descriptions (Hydra (WSDL WADL))
bull For relations to organizations projects publications fundinghellipbull CERIF for datasets
bull VIVO Datastar
Many vocabulariesVocabularies with relations to DCAT or same model
bull DCAT-AP and other extensions
bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT
bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)
bull Schemaorg
Other vocabulariesbull DataCite and re3data
bull CERIF for datasets
bull VIVO Datastar
bull INSPIRE
Examples of application of DCAT
bull CKAN data catalog tool(more in your workshop)
bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances
bull CIARD RING federated data catalogmanaged by GFAR
CIARD RING
Datasets in the RING dataset hub
bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)
bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)
bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)
bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary
a ldquoRING DCAT profilerdquo will be published
Sample dataset record
RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt
ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt
ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt
ltrdftype rdfresource=httpschemaorgDatasetgt
ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt
ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt
ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt
ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt
ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltdctpublisher rdfresource=httpringciardnetnode19510gt
ltschemapublisher rdfresource=httpringciardnetnode19510gt
ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt
ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt
ltdctspatialgtNationalltns1spatialgt
ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt
ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt
ltdctconformsTo rdfresource=httpringciardnetnode19239gt
ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt
ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt
ltdcttype rdfresource=httpringciardnettaxonomy_term81gt
ltdcatcatalog rdfresource=httpringciardnetnode19436gt
ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt
ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt
ltrdfDescriptiongt
References1 Issues and Recommendations Associated with Distributed Computation and Data
Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ
2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)
3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)
4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf
5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page
6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki
Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat
bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-
profile-data-portals-europe-final
bull DataCube httppurlorglinked-datacube
bull VOID httprdfsorgnsvoid-guide
bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml
bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology
bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables
bull CKAN httpckanorg
bull Dataverse httpdataverseorg
bull Datahub httpdatahubio
bull DataCite httpsearchdataciteorguiq=subject3Aagriculture
bull Re3data httpwwwre3dataorg
bull OpenAIRE httpswwwopenaireeu
bull CIARD RING httpringciardinfo
Dataset description and DCAT
DRTCISI-ICSUCODATAInternational Workshop on
Open Data Repositories
Thank you
Valeria Pesce
valeriapescefaoorg
Limitations of DCAT
It doesnrsquot cover
bull semantic relations to organisations persons software projects fundinghellip
bull dimensions and variables and syntax semantics of dimensions and variables
bull protocols and parameters for datasets available through APIs
bull time and space slices subsets
Combining DCAT with other vocabularies
ldquoOther complementary vocabularies may be used together with DCAT to provide more detailed format-specific information For example properties from the VoID vocabulary can be used if that dataset is in RDF formatrdquo
(from the DCAT specification)
DataCube structure definition
A cube is organized according to a set of dimensions attributes and measures
bull The dimension components serve to identify the observed dimensions (eg time geographic region gender elevationhellip)
bull The measure components represent the phenomenon being observed (eg life expectancy)
bull The attribute components enable specification of the units of measure any scaling factors and metadata such as the status of the observation (eg estimated provisional)
DataCube model for dataset structure
This part of the model could be re-used for describing the dimensions of any dataset also non-statistical
1) dimensions and semantics YES2) slices subsets YES
More than DCAT-AP
VOID model
c
dctlicensewvnorms wvwaiver
3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES
More than DCAT-AP
Complementing DCAT
bull For dimensions semantics of dimensions slicingbull DataCube
bull DDI
bull For API aspectsbull VOID (linked data)
bull Web services descriptions (Hydra (WSDL WADL))
bull For relations to organizations projects publications fundinghellipbull CERIF for datasets
bull VIVO Datastar
Many vocabulariesVocabularies with relations to DCAT or same model
bull DCAT-AP and other extensions
bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT
bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)
bull Schemaorg
Other vocabulariesbull DataCite and re3data
bull CERIF for datasets
bull VIVO Datastar
bull INSPIRE
Examples of application of DCAT
bull CKAN data catalog tool(more in your workshop)
bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances
bull CIARD RING federated data catalogmanaged by GFAR
CIARD RING
Datasets in the RING dataset hub
bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)
bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)
bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)
bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary
a ldquoRING DCAT profilerdquo will be published
Sample dataset record
RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt
ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt
ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt
ltrdftype rdfresource=httpschemaorgDatasetgt
ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt
ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt
ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt
ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt
ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltdctpublisher rdfresource=httpringciardnetnode19510gt
ltschemapublisher rdfresource=httpringciardnetnode19510gt
ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt
ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt
ltdctspatialgtNationalltns1spatialgt
ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt
ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt
ltdctconformsTo rdfresource=httpringciardnetnode19239gt
ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt
ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt
ltdcttype rdfresource=httpringciardnettaxonomy_term81gt
ltdcatcatalog rdfresource=httpringciardnetnode19436gt
ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt
ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt
ltrdfDescriptiongt
References1 Issues and Recommendations Associated with Distributed Computation and Data
Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ
2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)
3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)
4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf
5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page
6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki
Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat
bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-
profile-data-portals-europe-final
bull DataCube httppurlorglinked-datacube
bull VOID httprdfsorgnsvoid-guide
bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml
bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology
bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables
bull CKAN httpckanorg
bull Dataverse httpdataverseorg
bull Datahub httpdatahubio
bull DataCite httpsearchdataciteorguiq=subject3Aagriculture
bull Re3data httpwwwre3dataorg
bull OpenAIRE httpswwwopenaireeu
bull CIARD RING httpringciardinfo
Dataset description and DCAT
DRTCISI-ICSUCODATAInternational Workshop on
Open Data Repositories
Thank you
Valeria Pesce
valeriapescefaoorg
Combining DCAT with other vocabularies
ldquoOther complementary vocabularies may be used together with DCAT to provide more detailed format-specific information For example properties from the VoID vocabulary can be used if that dataset is in RDF formatrdquo
(from the DCAT specification)
DataCube structure definition
A cube is organized according to a set of dimensions attributes and measures
bull The dimension components serve to identify the observed dimensions (eg time geographic region gender elevationhellip)
bull The measure components represent the phenomenon being observed (eg life expectancy)
bull The attribute components enable specification of the units of measure any scaling factors and metadata such as the status of the observation (eg estimated provisional)
DataCube model for dataset structure
This part of the model could be re-used for describing the dimensions of any dataset also non-statistical
1) dimensions and semantics YES2) slices subsets YES
More than DCAT-AP
VOID model
c
dctlicensewvnorms wvwaiver
3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES
More than DCAT-AP
Complementing DCAT
bull For dimensions semantics of dimensions slicingbull DataCube
bull DDI
bull For API aspectsbull VOID (linked data)
bull Web services descriptions (Hydra (WSDL WADL))
bull For relations to organizations projects publications fundinghellipbull CERIF for datasets
bull VIVO Datastar
Many vocabulariesVocabularies with relations to DCAT or same model
bull DCAT-AP and other extensions
bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT
bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)
bull Schemaorg
Other vocabulariesbull DataCite and re3data
bull CERIF for datasets
bull VIVO Datastar
bull INSPIRE
Examples of application of DCAT
bull CKAN data catalog tool(more in your workshop)
bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances
bull CIARD RING federated data catalogmanaged by GFAR
CIARD RING
Datasets in the RING dataset hub
bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)
bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)
bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)
bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary
a ldquoRING DCAT profilerdquo will be published
Sample dataset record
RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt
ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt
ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt
ltrdftype rdfresource=httpschemaorgDatasetgt
ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt
ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt
ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt
ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt
ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltdctpublisher rdfresource=httpringciardnetnode19510gt
ltschemapublisher rdfresource=httpringciardnetnode19510gt
ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt
ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt
ltdctspatialgtNationalltns1spatialgt
ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt
ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt
ltdctconformsTo rdfresource=httpringciardnetnode19239gt
ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt
ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt
ltdcttype rdfresource=httpringciardnettaxonomy_term81gt
ltdcatcatalog rdfresource=httpringciardnetnode19436gt
ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt
ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt
ltrdfDescriptiongt
References1 Issues and Recommendations Associated with Distributed Computation and Data
Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ
2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)
3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)
4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf
5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page
6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki
Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat
bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-
profile-data-portals-europe-final
bull DataCube httppurlorglinked-datacube
bull VOID httprdfsorgnsvoid-guide
bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml
bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology
bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables
bull CKAN httpckanorg
bull Dataverse httpdataverseorg
bull Datahub httpdatahubio
bull DataCite httpsearchdataciteorguiq=subject3Aagriculture
bull Re3data httpwwwre3dataorg
bull OpenAIRE httpswwwopenaireeu
bull CIARD RING httpringciardinfo
Dataset description and DCAT
DRTCISI-ICSUCODATAInternational Workshop on
Open Data Repositories
Thank you
Valeria Pesce
valeriapescefaoorg
DataCube structure definition
A cube is organized according to a set of dimensions attributes and measures
bull The dimension components serve to identify the observed dimensions (eg time geographic region gender elevationhellip)
bull The measure components represent the phenomenon being observed (eg life expectancy)
bull The attribute components enable specification of the units of measure any scaling factors and metadata such as the status of the observation (eg estimated provisional)
DataCube model for dataset structure
This part of the model could be re-used for describing the dimensions of any dataset also non-statistical
1) dimensions and semantics YES2) slices subsets YES
More than DCAT-AP
VOID model
c
dctlicensewvnorms wvwaiver
3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES
More than DCAT-AP
Complementing DCAT
bull For dimensions semantics of dimensions slicingbull DataCube
bull DDI
bull For API aspectsbull VOID (linked data)
bull Web services descriptions (Hydra (WSDL WADL))
bull For relations to organizations projects publications fundinghellipbull CERIF for datasets
bull VIVO Datastar
Many vocabulariesVocabularies with relations to DCAT or same model
bull DCAT-AP and other extensions
bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT
bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)
bull Schemaorg
Other vocabulariesbull DataCite and re3data
bull CERIF for datasets
bull VIVO Datastar
bull INSPIRE
Examples of application of DCAT
bull CKAN data catalog tool(more in your workshop)
bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances
bull CIARD RING federated data catalogmanaged by GFAR
CIARD RING
Datasets in the RING dataset hub
bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)
bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)
bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)
bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary
a ldquoRING DCAT profilerdquo will be published
Sample dataset record
RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt
ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt
ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt
ltrdftype rdfresource=httpschemaorgDatasetgt
ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt
ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt
ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt
ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt
ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltdctpublisher rdfresource=httpringciardnetnode19510gt
ltschemapublisher rdfresource=httpringciardnetnode19510gt
ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt
ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt
ltdctspatialgtNationalltns1spatialgt
ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt
ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt
ltdctconformsTo rdfresource=httpringciardnetnode19239gt
ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt
ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt
ltdcttype rdfresource=httpringciardnettaxonomy_term81gt
ltdcatcatalog rdfresource=httpringciardnetnode19436gt
ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt
ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt
ltrdfDescriptiongt
References1 Issues and Recommendations Associated with Distributed Computation and Data
Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ
2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)
3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)
4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf
5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page
6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki
Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat
bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-
profile-data-portals-europe-final
bull DataCube httppurlorglinked-datacube
bull VOID httprdfsorgnsvoid-guide
bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml
bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology
bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables
bull CKAN httpckanorg
bull Dataverse httpdataverseorg
bull Datahub httpdatahubio
bull DataCite httpsearchdataciteorguiq=subject3Aagriculture
bull Re3data httpwwwre3dataorg
bull OpenAIRE httpswwwopenaireeu
bull CIARD RING httpringciardinfo
Dataset description and DCAT
DRTCISI-ICSUCODATAInternational Workshop on
Open Data Repositories
Thank you
Valeria Pesce
valeriapescefaoorg
DataCube model for dataset structure
This part of the model could be re-used for describing the dimensions of any dataset also non-statistical
1) dimensions and semantics YES2) slices subsets YES
More than DCAT-AP
VOID model
c
dctlicensewvnorms wvwaiver
3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES
More than DCAT-AP
Complementing DCAT
bull For dimensions semantics of dimensions slicingbull DataCube
bull DDI
bull For API aspectsbull VOID (linked data)
bull Web services descriptions (Hydra (WSDL WADL))
bull For relations to organizations projects publications fundinghellipbull CERIF for datasets
bull VIVO Datastar
Many vocabulariesVocabularies with relations to DCAT or same model
bull DCAT-AP and other extensions
bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT
bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)
bull Schemaorg
Other vocabulariesbull DataCite and re3data
bull CERIF for datasets
bull VIVO Datastar
bull INSPIRE
Examples of application of DCAT
bull CKAN data catalog tool(more in your workshop)
bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances
bull CIARD RING federated data catalogmanaged by GFAR
CIARD RING
Datasets in the RING dataset hub
bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)
bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)
bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)
bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary
a ldquoRING DCAT profilerdquo will be published
Sample dataset record
RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt
ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt
ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt
ltrdftype rdfresource=httpschemaorgDatasetgt
ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt
ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt
ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt
ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt
ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltdctpublisher rdfresource=httpringciardnetnode19510gt
ltschemapublisher rdfresource=httpringciardnetnode19510gt
ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt
ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt
ltdctspatialgtNationalltns1spatialgt
ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt
ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt
ltdctconformsTo rdfresource=httpringciardnetnode19239gt
ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt
ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt
ltdcttype rdfresource=httpringciardnettaxonomy_term81gt
ltdcatcatalog rdfresource=httpringciardnetnode19436gt
ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt
ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt
ltrdfDescriptiongt
References1 Issues and Recommendations Associated with Distributed Computation and Data
Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ
2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)
3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)
4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf
5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page
6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki
Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat
bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-
profile-data-portals-europe-final
bull DataCube httppurlorglinked-datacube
bull VOID httprdfsorgnsvoid-guide
bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml
bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology
bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables
bull CKAN httpckanorg
bull Dataverse httpdataverseorg
bull Datahub httpdatahubio
bull DataCite httpsearchdataciteorguiq=subject3Aagriculture
bull Re3data httpwwwre3dataorg
bull OpenAIRE httpswwwopenaireeu
bull CIARD RING httpringciardinfo
Dataset description and DCAT
DRTCISI-ICSUCODATAInternational Workshop on
Open Data Repositories
Thank you
Valeria Pesce
valeriapescefaoorg
VOID model
c
dctlicensewvnorms wvwaiver
3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES
More than DCAT-AP
Complementing DCAT
bull For dimensions semantics of dimensions slicingbull DataCube
bull DDI
bull For API aspectsbull VOID (linked data)
bull Web services descriptions (Hydra (WSDL WADL))
bull For relations to organizations projects publications fundinghellipbull CERIF for datasets
bull VIVO Datastar
Many vocabulariesVocabularies with relations to DCAT or same model
bull DCAT-AP and other extensions
bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT
bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)
bull Schemaorg
Other vocabulariesbull DataCite and re3data
bull CERIF for datasets
bull VIVO Datastar
bull INSPIRE
Examples of application of DCAT
bull CKAN data catalog tool(more in your workshop)
bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances
bull CIARD RING federated data catalogmanaged by GFAR
CIARD RING
Datasets in the RING dataset hub
bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)
bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)
bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)
bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary
a ldquoRING DCAT profilerdquo will be published
Sample dataset record
RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt
ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt
ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt
ltrdftype rdfresource=httpschemaorgDatasetgt
ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt
ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt
ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt
ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt
ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltdctpublisher rdfresource=httpringciardnetnode19510gt
ltschemapublisher rdfresource=httpringciardnetnode19510gt
ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt
ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt
ltdctspatialgtNationalltns1spatialgt
ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt
ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt
ltdctconformsTo rdfresource=httpringciardnetnode19239gt
ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt
ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt
ltdcttype rdfresource=httpringciardnettaxonomy_term81gt
ltdcatcatalog rdfresource=httpringciardnetnode19436gt
ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt
ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt
ltrdfDescriptiongt
References1 Issues and Recommendations Associated with Distributed Computation and Data
Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ
2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)
3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)
4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf
5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page
6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki
Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat
bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-
profile-data-portals-europe-final
bull DataCube httppurlorglinked-datacube
bull VOID httprdfsorgnsvoid-guide
bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml
bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology
bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables
bull CKAN httpckanorg
bull Dataverse httpdataverseorg
bull Datahub httpdatahubio
bull DataCite httpsearchdataciteorguiq=subject3Aagriculture
bull Re3data httpwwwre3dataorg
bull OpenAIRE httpswwwopenaireeu
bull CIARD RING httpringciardinfo
Dataset description and DCAT
DRTCISI-ICSUCODATAInternational Workshop on
Open Data Repositories
Thank you
Valeria Pesce
valeriapescefaoorg
Complementing DCAT
bull For dimensions semantics of dimensions slicingbull DataCube
bull DDI
bull For API aspectsbull VOID (linked data)
bull Web services descriptions (Hydra (WSDL WADL))
bull For relations to organizations projects publications fundinghellipbull CERIF for datasets
bull VIVO Datastar
Many vocabulariesVocabularies with relations to DCAT or same model
bull DCAT-AP and other extensions
bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT
bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)
bull Schemaorg
Other vocabulariesbull DataCite and re3data
bull CERIF for datasets
bull VIVO Datastar
bull INSPIRE
Examples of application of DCAT
bull CKAN data catalog tool(more in your workshop)
bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances
bull CIARD RING federated data catalogmanaged by GFAR
CIARD RING
Datasets in the RING dataset hub
bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)
bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)
bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)
bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary
a ldquoRING DCAT profilerdquo will be published
Sample dataset record
RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt
ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt
ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt
ltrdftype rdfresource=httpschemaorgDatasetgt
ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt
ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt
ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt
ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt
ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltdctpublisher rdfresource=httpringciardnetnode19510gt
ltschemapublisher rdfresource=httpringciardnetnode19510gt
ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt
ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt
ltdctspatialgtNationalltns1spatialgt
ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt
ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt
ltdctconformsTo rdfresource=httpringciardnetnode19239gt
ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt
ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt
ltdcttype rdfresource=httpringciardnettaxonomy_term81gt
ltdcatcatalog rdfresource=httpringciardnetnode19436gt
ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt
ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt
ltrdfDescriptiongt
References1 Issues and Recommendations Associated with Distributed Computation and Data
Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ
2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)
3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)
4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf
5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page
6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki
Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat
bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-
profile-data-portals-europe-final
bull DataCube httppurlorglinked-datacube
bull VOID httprdfsorgnsvoid-guide
bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml
bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology
bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables
bull CKAN httpckanorg
bull Dataverse httpdataverseorg
bull Datahub httpdatahubio
bull DataCite httpsearchdataciteorguiq=subject3Aagriculture
bull Re3data httpwwwre3dataorg
bull OpenAIRE httpswwwopenaireeu
bull CIARD RING httpringciardinfo
Dataset description and DCAT
DRTCISI-ICSUCODATAInternational Workshop on
Open Data Repositories
Thank you
Valeria Pesce
valeriapescefaoorg
Many vocabulariesVocabularies with relations to DCAT or same model
bull DCAT-AP and other extensions
bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT
bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)
bull Schemaorg
Other vocabulariesbull DataCite and re3data
bull CERIF for datasets
bull VIVO Datastar
bull INSPIRE
Examples of application of DCAT
bull CKAN data catalog tool(more in your workshop)
bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances
bull CIARD RING federated data catalogmanaged by GFAR
CIARD RING
Datasets in the RING dataset hub
bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)
bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)
bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)
bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary
a ldquoRING DCAT profilerdquo will be published
Sample dataset record
RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt
ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt
ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt
ltrdftype rdfresource=httpschemaorgDatasetgt
ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt
ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt
ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt
ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt
ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltdctpublisher rdfresource=httpringciardnetnode19510gt
ltschemapublisher rdfresource=httpringciardnetnode19510gt
ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt
ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt
ltdctspatialgtNationalltns1spatialgt
ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt
ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt
ltdctconformsTo rdfresource=httpringciardnetnode19239gt
ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt
ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt
ltdcttype rdfresource=httpringciardnettaxonomy_term81gt
ltdcatcatalog rdfresource=httpringciardnetnode19436gt
ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt
ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt
ltrdfDescriptiongt
References1 Issues and Recommendations Associated with Distributed Computation and Data
Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ
2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)
3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)
4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf
5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page
6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki
Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat
bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-
profile-data-portals-europe-final
bull DataCube httppurlorglinked-datacube
bull VOID httprdfsorgnsvoid-guide
bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml
bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology
bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables
bull CKAN httpckanorg
bull Dataverse httpdataverseorg
bull Datahub httpdatahubio
bull DataCite httpsearchdataciteorguiq=subject3Aagriculture
bull Re3data httpwwwre3dataorg
bull OpenAIRE httpswwwopenaireeu
bull CIARD RING httpringciardinfo
Dataset description and DCAT
DRTCISI-ICSUCODATAInternational Workshop on
Open Data Repositories
Thank you
Valeria Pesce
valeriapescefaoorg
Examples of application of DCAT
bull CKAN data catalog tool(more in your workshop)
bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances
bull CIARD RING federated data catalogmanaged by GFAR
CIARD RING
Datasets in the RING dataset hub
bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)
bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)
bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)
bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary
a ldquoRING DCAT profilerdquo will be published
Sample dataset record
RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt
ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt
ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt
ltrdftype rdfresource=httpschemaorgDatasetgt
ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt
ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt
ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt
ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt
ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltdctpublisher rdfresource=httpringciardnetnode19510gt
ltschemapublisher rdfresource=httpringciardnetnode19510gt
ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt
ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt
ltdctspatialgtNationalltns1spatialgt
ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt
ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt
ltdctconformsTo rdfresource=httpringciardnetnode19239gt
ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt
ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt
ltdcttype rdfresource=httpringciardnettaxonomy_term81gt
ltdcatcatalog rdfresource=httpringciardnetnode19436gt
ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt
ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt
ltrdfDescriptiongt
References1 Issues and Recommendations Associated with Distributed Computation and Data
Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ
2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)
3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)
4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf
5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page
6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki
Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat
bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-
profile-data-portals-europe-final
bull DataCube httppurlorglinked-datacube
bull VOID httprdfsorgnsvoid-guide
bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml
bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology
bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables
bull CKAN httpckanorg
bull Dataverse httpdataverseorg
bull Datahub httpdatahubio
bull DataCite httpsearchdataciteorguiq=subject3Aagriculture
bull Re3data httpwwwre3dataorg
bull OpenAIRE httpswwwopenaireeu
bull CIARD RING httpringciardinfo
Dataset description and DCAT
DRTCISI-ICSUCODATAInternational Workshop on
Open Data Repositories
Thank you
Valeria Pesce
valeriapescefaoorg
CIARD RING
Datasets in the RING dataset hub
bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)
bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)
bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)
bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary
a ldquoRING DCAT profilerdquo will be published
Sample dataset record
RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt
ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt
ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt
ltrdftype rdfresource=httpschemaorgDatasetgt
ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt
ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt
ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt
ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt
ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltdctpublisher rdfresource=httpringciardnetnode19510gt
ltschemapublisher rdfresource=httpringciardnetnode19510gt
ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt
ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt
ltdctspatialgtNationalltns1spatialgt
ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt
ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt
ltdctconformsTo rdfresource=httpringciardnetnode19239gt
ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt
ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt
ltdcttype rdfresource=httpringciardnettaxonomy_term81gt
ltdcatcatalog rdfresource=httpringciardnetnode19436gt
ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt
ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt
ltrdfDescriptiongt
References1 Issues and Recommendations Associated with Distributed Computation and Data
Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ
2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)
3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)
4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf
5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page
6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki
Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat
bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-
profile-data-portals-europe-final
bull DataCube httppurlorglinked-datacube
bull VOID httprdfsorgnsvoid-guide
bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml
bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology
bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables
bull CKAN httpckanorg
bull Dataverse httpdataverseorg
bull Datahub httpdatahubio
bull DataCite httpsearchdataciteorguiq=subject3Aagriculture
bull Re3data httpwwwre3dataorg
bull OpenAIRE httpswwwopenaireeu
bull CIARD RING httpringciardinfo
Dataset description and DCAT
DRTCISI-ICSUCODATAInternational Workshop on
Open Data Repositories
Thank you
Valeria Pesce
valeriapescefaoorg
Datasets in the RING dataset hub
bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)
bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)
bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)
bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary
a ldquoRING DCAT profilerdquo will be published
Sample dataset record
RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt
ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt
ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt
ltrdftype rdfresource=httpschemaorgDatasetgt
ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt
ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt
ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt
ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt
ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltdctpublisher rdfresource=httpringciardnetnode19510gt
ltschemapublisher rdfresource=httpringciardnetnode19510gt
ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt
ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt
ltdctspatialgtNationalltns1spatialgt
ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt
ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt
ltdctconformsTo rdfresource=httpringciardnetnode19239gt
ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt
ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt
ltdcttype rdfresource=httpringciardnettaxonomy_term81gt
ltdcatcatalog rdfresource=httpringciardnetnode19436gt
ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt
ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt
ltrdfDescriptiongt
References1 Issues and Recommendations Associated with Distributed Computation and Data
Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ
2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)
3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)
4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf
5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page
6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki
Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat
bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-
profile-data-portals-europe-final
bull DataCube httppurlorglinked-datacube
bull VOID httprdfsorgnsvoid-guide
bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml
bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology
bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables
bull CKAN httpckanorg
bull Dataverse httpdataverseorg
bull Datahub httpdatahubio
bull DataCite httpsearchdataciteorguiq=subject3Aagriculture
bull Re3data httpwwwre3dataorg
bull OpenAIRE httpswwwopenaireeu
bull CIARD RING httpringciardinfo
Dataset description and DCAT
DRTCISI-ICSUCODATAInternational Workshop on
Open Data Repositories
Thank you
Valeria Pesce
valeriapescefaoorg
Sample dataset record
RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt
ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt
ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt
ltrdftype rdfresource=httpschemaorgDatasetgt
ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt
ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt
ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt
ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt
ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltdctpublisher rdfresource=httpringciardnetnode19510gt
ltschemapublisher rdfresource=httpringciardnetnode19510gt
ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt
ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt
ltdctspatialgtNationalltns1spatialgt
ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt
ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt
ltdctconformsTo rdfresource=httpringciardnetnode19239gt
ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt
ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt
ltdcttype rdfresource=httpringciardnettaxonomy_term81gt
ltdcatcatalog rdfresource=httpringciardnetnode19436gt
ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt
ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt
ltrdfDescriptiongt
References1 Issues and Recommendations Associated with Distributed Computation and Data
Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ
2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)
3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)
4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf
5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page
6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki
Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat
bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-
profile-data-portals-europe-final
bull DataCube httppurlorglinked-datacube
bull VOID httprdfsorgnsvoid-guide
bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml
bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology
bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables
bull CKAN httpckanorg
bull Dataverse httpdataverseorg
bull Datahub httpdatahubio
bull DataCite httpsearchdataciteorguiq=subject3Aagriculture
bull Re3data httpwwwre3dataorg
bull OpenAIRE httpswwwopenaireeu
bull CIARD RING httpringciardinfo
Dataset description and DCAT
DRTCISI-ICSUCODATAInternational Workshop on
Open Data Repositories
Thank you
Valeria Pesce
valeriapescefaoorg
RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt
ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt
ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt
ltrdftype rdfresource=httpschemaorgDatasetgt
ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt
ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt
ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt
ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt
ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt
ltdctpublisher rdfresource=httpringciardnetnode19510gt
ltschemapublisher rdfresource=httpringciardnetnode19510gt
ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt
ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt
ltdctspatialgtNationalltns1spatialgt
ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt
ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt
ltdctconformsTo rdfresource=httpringciardnetnode19239gt
ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt
ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt
ltdcttype rdfresource=httpringciardnettaxonomy_term81gt
ltdcatcatalog rdfresource=httpringciardnetnode19436gt
ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt
ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt
ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt
ltrdfDescriptiongt
References1 Issues and Recommendations Associated with Distributed Computation and Data
Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ
2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)
3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)
4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf
5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page
6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki
Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat
bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-
profile-data-portals-europe-final
bull DataCube httppurlorglinked-datacube
bull VOID httprdfsorgnsvoid-guide
bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml
bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology
bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables
bull CKAN httpckanorg
bull Dataverse httpdataverseorg
bull Datahub httpdatahubio
bull DataCite httpsearchdataciteorguiq=subject3Aagriculture
bull Re3data httpwwwre3dataorg
bull OpenAIRE httpswwwopenaireeu
bull CIARD RING httpringciardinfo
Dataset description and DCAT
DRTCISI-ICSUCODATAInternational Workshop on
Open Data Repositories
Thank you
Valeria Pesce
valeriapescefaoorg
References1 Issues and Recommendations Associated with Distributed Computation and Data
Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ
2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)
3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)
4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf
5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page
6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki
Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat
bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-
profile-data-portals-europe-final
bull DataCube httppurlorglinked-datacube
bull VOID httprdfsorgnsvoid-guide
bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml
bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology
bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables
bull CKAN httpckanorg
bull Dataverse httpdataverseorg
bull Datahub httpdatahubio
bull DataCite httpsearchdataciteorguiq=subject3Aagriculture
bull Re3data httpwwwre3dataorg
bull OpenAIRE httpswwwopenaireeu
bull CIARD RING httpringciardinfo
Dataset description and DCAT
DRTCISI-ICSUCODATAInternational Workshop on
Open Data Repositories
Thank you
Valeria Pesce
valeriapescefaoorg
Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat
bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-
profile-data-portals-europe-final
bull DataCube httppurlorglinked-datacube
bull VOID httprdfsorgnsvoid-guide
bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml
bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology
bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables
bull CKAN httpckanorg
bull Dataverse httpdataverseorg
bull Datahub httpdatahubio
bull DataCite httpsearchdataciteorguiq=subject3Aagriculture
bull Re3data httpwwwre3dataorg
bull OpenAIRE httpswwwopenaireeu
bull CIARD RING httpringciardinfo
Dataset description and DCAT
DRTCISI-ICSUCODATAInternational Workshop on
Open Data Repositories
Thank you
Valeria Pesce
valeriapescefaoorg
Dataset description and DCAT
DRTCISI-ICSUCODATAInternational Workshop on
Open Data Repositories
Thank you
Valeria Pesce
valeriapescefaoorg