154
This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Project Acronym: DataBio Grant Agreement number: 732064 (H2020-ICT-2016-1 – Innovation Action) Project Full Title: Data-Driven Bioeconomy Project Coordinator: INTRASOFT International DELIVERABLE D4.3 – Data sets, formats and models (Public version) Dissemination level PU -Public Type of Document Report Contractual date of delivery M20 – 31/8/2018 Deliverable Leader SINTEF Status - version, date Final – v1.0-Public, 12/12/2018 WP / Task responsible WP4 (T4.5 and T4.6) Keywords: data set, metadata, datastream

D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee.

Project Acronym: DataBio

Grant Agreement number: 732064 (H2020-ICT-2016-1 – Innovation Action)

Project Full Title: Data-Driven Bioeconomy

Project Coordinator: INTRASOFT International

DELIVERABLE

D4.3 – Data sets, formats and models (Public version)

Dissemination level PU -Public

Type of Document Report

Contractual date of delivery M20 – 31/8/2018

Deliverable Leader SINTEF

Status - version, date Final – v1.0-Public, 12/12/2018

WP / Task responsible WP4 (T4.5 and T4.6)

Keywords: data set, metadata, datastream

Page 2: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 2

Executive Summary

The D4.3 document starts with an introduction to the DataBio project and other documents

related to D4.3 followed by an introduction to data sharing and data economy in the context

of DataBio.

The FAIR principle is introduced as a foundation for data finding, access, interoperability and

reuse - and as a further motivation for meta data and discovery of datasets through data

registries, in particular the DataBio Hub. It is also options for further support for data sharing

and data exchange in particular through the use of linked data and industrial data platforms

for data sharing and data exchange.

The context of datasets in DataBio, is presented including external drivers for data sharing

and data exchange, stakeholders and license models. Data interoperability through

ontologies, models, formats and standards and data access through standard services and

APIs is introduced related to the DataBio standardisation engagement in particular in the

Geospatial and Earth Observation areas.

Furthermore, an overview of the requirements for datasets and datastreams in DataBio

grouped by pilots and the platform itself is presented. This is followed by a detailed

description of the datasets in DataBio in a metadata template from the description of the

datasets in the DataBio hub, for existing, improved, new and other relevant datasets. The final

section gives an example of how a dataset can be used for application development, followed

by concluding remarks.

The deliverable also comprises contributions from WP5 on the EO Datasets and from the tasks

T4.5 Big Data Variety Management and T4.6 Data Acquisition with Security support in WP4.

The first phase of the DataBio project has focused on the usage and creation of datasets based

on the needs and requirements of the DataBio pilots. The next phase will continue with this,

but will also have increased focus on interoperability aspects of datasets through the use of

ontologies and potential standard data models and access mechanisms/services and APIs.

There will be an increased focus on secure data sharing and data exchange beyond the

individual pilots to support a growing data economy in the DataBio areas of agriculture,

forestry and fishery.

Relation with Other DataBio Platform Deliverables The DataBio project includes three piloting work packages (WP1-3) and two related platform work packages (WP4 handling data in general including IoT data and WP5 5 focusing on Earth Observation and geospatial data) that support the pilots (Figure 1). The DataBio platform provides Big Data capabilities to the pilots by forming software pipelines of components

This is a public version of Deliverable D4.3 “Data sets, formats and models”.

Confidential information from the original document has been omitted.

Page 3: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 3

through which data flows from the sources in agriculture, forestry and fishery through data management, analytics, and visualization stages in the pilots.

Figure 1: Work packages and their roles in DataBio

The platform developed in DataBio is described in the Deliverables D4.1, D4.2, D4.3 (WP4) and D5.1, D5.2, D5.3 (WP5) (Figure 1). Deliverables D4.1-3 define the Milestone M7 Service ready for Trial 1, whereas Deliverables D5.1-3 define the Milestone M9 EO Services ready for integration. The platform services and pipelines have been in trials since April 2018 (M16). More specifically, the public deliverable D4.1 Platforms and interfaces describes the software components to be utilized by the pilots. Most of components are already in use in the first pilot trials. In addition, this deliverable reports the outcome of a matchmaking process, in which the pilots selected which components to deploy in their pilots.

Deliverable D4.2 Services for tests builds on D4.1 and provides an overview of the component pipelines as identified at month 16 (M16) of the project. It also provides guidelines for successful implementation and deployment of the pipelines. This deliverable, D4.3 Datasets, formats and models is due at the end of August 2018. While the two earlier reports deal with software modules, this report will focus on the data sets and streams employed in DataBio. Data formats, standards and models enabling easy findability, access, interoperability, and reusability of data (FAIR principle) will be dealt with. Thus, in this deliverable we will address topics beyond the coverage of single pilots. Deliverable D5.1 EO component specification includes an analysis of the EO dataset and component related requirements provided by the pilots. It was published in end of 2017 and contains an overview of best practices of EO access and initial component and dataset requirements based on the DataBio pilot needs.

Components &

IoT datasets

Agro Pilot 1

Agro Pilot 2

Agro Pilot 13

Forest Pilot 1

Forest Pilot 2

Forest Pilot 7

Fishery Pilot 1

Fishery Pilot 2

Components &

Earth Observation

datasets

WP4

WP1-3

Fishery Pilot 6

...

...

...

DataBio platformwith big data components

and datasetsWP5

Deliverables

D4.1, D4.2, D4.3

Milestone M7

Deliverables

D5.1, D5.2,

D5.3

Milestone M9

Page 4: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 4

Deliverable D5.2 EO component and interfaces describes, building on D5.1, the Earth Observations component pipelines similarly as D4.2 does for IoT components. It also includes examples of data experimentations with the pipelines. Deliverable D5.3 EO services and tools builds on 5.1 and 5.2 and describes how the technical components from DataBio can be scaled-up to services and tools that are installed as Software as a Service (SaaS) or on premise. It further provides the information how and under which conditions these services and tools can be externally accessed.

Page 5: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 5

Deliverable Leader: Arne-Jørgen Berre (SINTEF)

Contributors:

Ståle Walderhaug (SINTEF),

Pekka Siltanen (VTT), Caj Södergård (VTT),

Miguel Ángel Esbrí (ATOS), Javier Hitado Simarro (ATOS),

Ephrem Habyarimana (CREA),

Iason Kastanis (CSEM), Margus Freudenthal (CYBER),

Allan Aasbjerg Nielsen (DTU), Marco Corsi – (e-geos),

Kostas Akasoglou (EXUS), Ioannis Komnios (EXUS),

Adamantios Maragkos (EXUS), Anuj Sharma (EXUS),

Charikleia Stefanou (EXUS), Dimitris Vassiliadis (EXUS),

Petr Lukes (FMI), Eva Klien (Fraunhofer),

Ivo Senner (Fraunhofer), Fabiana Fournier (IBM),

Inna Skarbovsky (IBM), Christian Zinke (InfAI),

George Bravos (INTRASOFT),

Vassilis Chatzigiannakis (INTRASOFT),

Karel Charvat (LESPRO), Karel Charvat, jr (LESPRO),

Tomas Reznik (LESPRO), Anu Kosunen (METSAK),

Virpi Stenman (METSAK), Seppo Huurinainen (MHGS),

Veli-Matti Plosila (MHGS), Panagiotis Elias (NP),

Kostas Karalas (NP), Stamatis Krommidas (NP),

Kostas Mastrogiannis (NP), Natassa Miliaraki (NP),

Ilias Panos (NP), Menelaos Perdikeas (NP),

Savvas Rogotis (NP), Pavlos Tsagkis (NP),

Marco Folegani (MEEO), Ingo Simonis (OGCE),

Soumya Brahma (PSNC), Raul Palma (PSNC),

Juliusz Pukacki (PSNC), Jarkko Vähäkangas (Senop),

Andrey Sadovykh (Softeam),

Marc Gilles (Spacebel), Yves Coene (Spacebel),

Anca Liana Costea (TerraS), Adrian Stoica (TerraS),

Delia Teleaga (TerraS), Jesus Estrada Villegas (TRAGSA),

Asuncion Roldan Zamarron (TRAGSA), Michal Kepka (UWB),

Karel Jedlička (UWB), Tomas Mildorf (UWB),

Erwin Goor (VITO), Jarmo Kalaoja (VTT),

Tuomas Paaso (VTT), Kari Rainio (VTT), Renne Tergujef (VTT)

Reviewers:

Per Gunnar Auran (SINTEF Fishery)

Tomas Mildorf (UWB)

Virpi Stenman (METSAK)

Approved by: Athanasios Poulakidas (INTRASOFT)

Page 6: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 6

Document History

Version Date Contributor(s) Description

0.1 05.06.2018 Ståle Walderhaug Initial ToC

0.2 21.06.2018 Ståle Walderhaug /

Arne J. Berre ToC with section assignments

0.3 01.08.2018 Datasets included from partners

0.4 15.08.2018 Adrian Stoica,

Terrasigna

D5.i2 datasets included. Added FAIR data.

Examples of use included.

0.5 20.08.2018 Ståle Walderjaug Updated with license policy information.

Added concerns section

0.6 24.08.2018 Caj Södergård, Ståle

Walderhaug Requirement in place. Datasets updates

0.7 28.08.2018

Arne J. Berre, Ståle

Walderhaug, Caj

Södergård

Version for internal review

0.8 31.08.2018 Ståle Walderhaug,

Arne J Berre Version updated after internal review

1.0 31.08.2018 Athanasios

Poulakidas Final version for submission

1.0-

Public 12.12.2018 Caj Södergård Public version of the document

Page 7: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 7

Table of Contents EXECUTIVE SUMMARY ..................................................................................................................................... 2

RELATION WITH OTHER DATABIO PLATFORM DELIVERABLES ........................................................................................... 2

TABLE OF CONTENTS ........................................................................................................................................ 7

TABLE OF FIGURES ........................................................................................................................................... 9

LIST OF TABLES ................................................................................................................................................ 9

DEFINITIONS, ACRONYMS AND ABBREVIATIONS ........................................................................................... 10

INTRODUCTION .................................................................................................................................... 15

1.1 PROJECT SUMMARY ..................................................................................................................................... 15 1.2 DOCUMENT SCOPE ...................................................................................................................................... 17 1.3 DOCUMENT STRUCTURE ............................................................................................................................... 17

BACKGROUND ...................................................................................................................................... 19

2.1 DATA SHARING AND DATA ECONOMY IN DATABIO ............................................................................................. 19 2.2 FAIR PRINCIPLES ......................................................................................................................................... 19 2.3 METADATA AND DISCOVERY OF DATASETS ........................................................................................................ 21 2.4 DATA REGISTRIES, DATA SHARING AND DATA EXCHANGE ...................................................................................... 21

2.4.1 DataBioHub ................................................................................................................................... 22 2.4.2 Linked Data and Open Micka ........................................................................................................ 23 2.4.3 Industrial data spaces ................................................................................................................... 25 2.4.4 Openness and payment ................................................................................................................ 26 2.4.5 UXP – Exchange Platform - Cybernetica ....................................................................................... 26

2.5 INDUSTRIAL DATA SPACES AND CONNECTORS .................................................................................................... 27 2.5.1 EU Data Portal .............................................................................................................................. 29 2.5.2 GEOSS............................................................................................................................................ 29 2.5.3 DCAT and GeoDCAT ...................................................................................................................... 30 2.5.4 CKAN ............................................................................................................................................. 30

2.6 OTHERS ..................................................................................................................................................... 30

CONTEXT VIEW ..................................................................................................................................... 33

3.1 EXTERNAL DRIVERS FOR DATA SHARING AND DATA EXCHANGE .............................................................................. 33 3.2 DATA INTEROPERABILITY THROUGH ONTOLOGIES, MODELS, FORMATS AND STANDARDS ............................................. 35

3.2.1 Geospatial and Earth Observation ontologies and standards ...................................................... 35 3.2.2 Agricultural ontologies and standards .......................................................................................... 35 3.2.3 Forestry ontologies and standards ............................................................................................... 36 3.2.4 Fishery ontologies and standards ................................................................................................. 36

3.3 DATA ACCESS THROUGH STANDARD SERVICES AND APIS ...................................................................................... 38 3.3.1 Geospatial Standards, Data Types and Services ........................................................................... 38 3.3.2 Sensor Standards, ontologies, data representations .................................................................... 39 3.3.3 API approach ................................................................................................................................. 41

3.4 STAKEHOLDERS AND CONCERNS ...................................................................................................................... 42 3.5 LICENSE MODELS FOR DATA REUSE .................................................................................................................. 46

REQUIREMENTS VIEW .......................................................................................................................... 47

4.1 TYPES OF EO DATA AND SENSORS USED IN THE DATABIO PILOTS AND THEIR CHARACTERISTICS .................................... 47 4.2 DATASETS AND DATASTREAM REQUIREMENTS FROM PLATFORM ........................................................................... 55 4.3 DATASETS AND DATASTREAM REQUIREMENTS FROM AGRICULTURE PILOTS ............................................................. 57

Page 8: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 8

4.4 DATASETS AND DATASTREAM REQUIREMENTS FROM FORESTRY PILOTS ................................................................... 62 4.5 DATASETS AND DATASTREAM REQUIREMENTS FROM FISHERY PILOTS ..................................................................... 64

DATASETS: EXISTING, IMPROVED, NEW AND OTHERS .......................................................................... 69

5.1 EXISTING DATASETS UTILIZED BY DATABIO PILOTS .............................................................................................. 69 5.1.1 Open Transport Map (UWB - D03.02) ........................................................................................... 69 5.1.2 Forest resource data (METSAK - D18.01) ...................................................................................... 71 5.1.3 Landsat 8 OLI data ........................................................................................................................ 74 5.1.4 Sentinel 3 OLCI (Ocean and Land Colour Instrument) data ........................................................... 77 5.1.5 Sentinel 3 SLSTR (Sea and Land Surface Temperature Radiometer) ............................................. 78 5.1.6 MODIS data ................................................................................................................................... 80 5.1.7 Proba-V data ................................................................................................................................. 81 5.1.8 Global Precipitation Measurement (GPM) mission data .............................................................. 82 5.1.9 KNMI (Koninklijk Nederlands Meteorologisch Instituut) precipitation data ................................. 84 5.1.10 CMEMS (Copernicus Marine Environment Monitoring Service) data ...................................... 85 5.1.11 Sentinel 2A (ESA D11.01) .......................................................................................................... 86 5.1.12 Sentinel-2 Data ......................................................................................................................... 88 5.1.13 Sentinel 3 SRAL (Synthetic Aperture Radar Altimeter) data ..................................................... 89 5.1.14 Sentinel 3 MWR (Microwave Radiometer) data ...................................................................... 89

5.2 DATASETS IMPROVED BY DATABIO .................................................................................................................. 89 5.2.1 RPAS (Remotely Piloted Aircraft Systems) data ............................................................................ 89 5.2.2 Ortophotos .................................................................................................................................... 90 5.2.3 gaiasense field (D13.01)................................................................................................................ 91 5.2.4 Land use and properties - Greek agriculture pilots (NP - D13.02) ................................................. 93 1.1.1 5.3.13 Land use and properties - Greek agriculture pilots ............................................................ 93 5.2.5 Customer and forest estate data (METSAK - D18.02) ................................................................... 96

5.3 NEW DATASETS CREATED DURING DATABIO ..................................................................................................... 98 5.3.1 Canopy height map (FMI - D14.05) ............................................................................................... 98 5.3.2 Orthophotos - (IGN - D11.02) ........................................................................................................ 99 5.3.3 GEOSS sources (D11.03) .............................................................................................................. 101 5.3.4 RPAS data (Tragsa - D11.04) ....................................................................................................... 101 5.3.5 MFE Spanish Forest Map (D11.06) .............................................................................................. 103 5.3.6 Field data - pilot B2 (Tragsa - D11.07) ........................................................................................ 105 5.3.7 Forest damage (FMI - D14.07) .................................................................................................... 107 5.3.8 Open Forest Data (METSAK - D18.01) ......................................................................................... 108 5.3.9 Hyperspectral image orthomosaic (Senop - D44.02) .................................................................. 111 5.3.10 Leaf area index (FMI - D14.06) ............................................................................................... 111 5.3.11 NASA CMR Landsat Datasets via FedEO Gateway (SPACEBEL - D07.02) ............................... 114 5.3.12 Ontology for (Precision) Agriculture (PSNC -D09.01) ............................................................. 115 5.3.13 Open Land Use (Lespro - D02.01) ........................................................................................... 117 5.3.14 Phenomics, metabolomics, genomics and environmental datasets (CERTH - DS40.01) ........ 122 5.3.15 Quality control data (METSAK - D18.04) ................................................................................ 122 5.3.16 Sentinels Scientific Hub Datasets via FedEO Gateway (SPACEBEL -D07.01) .......................... 125 5.3.17 SigPAC (Tragsa - D11.05) ....................................................................................................... 127 5.3.18 Smart POI dataset (Lespro - D02.01) ...................................................................................... 128 5.3.19 Stand age map (FMI - D14.04) ............................................................................................... 129 5.3.20 Storm and forest damage observations and possible risk areas (METSAK - D18.03a) .......... 130 5.3.21 Forest road condition observations (METSAK - D18.03b) ...................................................... 133 5.3.22 Tree species map (FMI - D14.03) ............................................................................................ 136 5.3.23 Wuudis data (MHGS - D20.01) ............................................................................................... 138

5.4 RECOMMENDED INTERACTION STRUCTURES: ATOS ......................................................................................... 139

Page 9: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 9

CONCLUDING REMARKS ..................................................................................................................... 149

REFERENCES ....................................................................................................................................... 150

APPENDIX A METADATA TEMPLATE TABLE ........................................................................................... 152

Table of Figures

FIGURE 1: WORK PACKAGES AND THEIR ROLES IN DATABIO ................................................................................................. 3 FIGURE 2: HOW DISTRIBUTED STORAGE AND PAYMENTS WORK ........................................................................................... 26 FIGURE 3: FUNCTIONAL ARCHITECTURE OF THE INDUSTRIAL DATA SPACE ............................................................................. 28 FIGURE 4: OPENAIRE ................................................................................................................................................. 31 FIGURE 5: DRYAD .................................................................................................................................................... 32 FIGURE 6: THE FLUX STANDARDS AND STATUS (FROM UN ESCAP PRESENTATION OF DR HEINER LEHR) [REF-37]. .................... 37 FIGURE 7: ARCHIMATE STRATEGY DIAGRAM SHOWING HOW THE PILOT SYSTEM WILL REALIZE THE DEFINED GOALS ....................... 42 FIGURE 8: ARCHIMATE BUSINESS DIAGRAM SHOWING THE DATA PROCESSING, DATASETS AND ACTORS INVOLVED ........................ 43 FIGURE 9: ARCHIMATE DATA VIEW FOR ONE OF THE FISHERY PILOTS (B2) ............................................................................ 44 FIGURE 10: THE B2 FISHERY PILOT LIFECYCLE VIEW SHOWING HOW DATA IS PROVIDED AS INPUT TO PROCESSING STEPS ................. 44 FIGURE 11: THE B2 FISHERY PILOT PIPELINE VIEW SHOWING HOW DATASETS ARE INTERFACED .................................................. 45 FIGURE 12: EO DATA COLLECTION CONTEXT .................................................................................................................. 47

List of Tables TABLE 1: THE DATABIO CONSORTIUM PARTNERS ............................................................................................................. 15 TABLE 2: TYPES OF DATA USED IN DATABIO PILOT PROJECTS .............................................................................................. 48

Page 10: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 10

Definitions, Acronyms and Abbreviations Acronym/

Abbreviation Title

ADES Application Deployment and Execution Service

AMS Application Management Client

API Application programming interface

ArchiMate ArchiMate® Specification, modelling language for Enterprise

Architecture

ATOM ATOM (Syndication Format)

BDVA Big Data Value Association

CAP Common Agriculture Policy

CCSDS Consultative Committee for Space Data Systems

CEOS Committee on Earth Observing Satellites

CEP Complex Event Processing

CETL Connect Extract Transform and Load

CKAN Comprehensive Kerbal Archive Network

CMEMS Copernicus Marine Environment Monitoring Service

CMR Common Metadata Repository

CPS Cyber Physical Systems

CSW Catalogue Service for Web

DCAT Data Catalog Vocabulary

DDS Data Distribution System

DEI Digitising European Industry

DIAS Data and Information Access Services

DSL Domain Specific Language

DWG Domain Working Group

ECMWF European Centre for Medium-Range Weather Forecasts

ECSS European Collaboration on Space Standardisation

EO Earth Observation

ERS European Remote Sensing Satellite

ESA European Space Agency

FAD Fish Aggregating Devices

FTP File Transfer Protocol

GEMET GEneral Multilingual Environmental Thesaurus

GEO Group on Earth Observation

GSCDA GMES Space Component Data Access

GUI Graphical User Interface

HLA High Level Architecture

HPC High-Performance Computing

HTTP Hypertext Transfer Protocol

Page 11: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 11

IAS Invasive Alien Species

IDP Industrial Data Platform

IETF Internet Engineering Task Force

INSPIRE Infrastructure for Spatial Information in Europe

IoT Internet of Things

ISO International Organisation for Standardisation

JSON JavaScript Object Notation

KMI Koninklijk Meteorologisch Instituut

KML Keyhole Markup Language

KNMI Koninklijk Nderlands Meteorologisch Instituut

LPIS Land Parcel Identification System

NASA National Aeronautics and Space Administration

NG Next Generation

NIST National Institute of Standards and Technology

NN Nearest Neighbors

OAIS Open Archival Information System

OASIS Organization for the Advancement of Structured Information

Standards

ODBC Open Database Connectivity

OGC Open Geospatial Consortium

OLCI Ocean and Land Colour Imager

OLU Open Land Use

OTM Open Transport Map

PaaS Platform as a Service

PDP Research Data Platform

PPP Public-Private Partnership

PROTON IBM PROactive Technology ONline

RDF Resource Description Framework

REST REpresentational State Transfer

RMSE Root Mean Square Error

RPAS Remotely Piloted Aircraft Systems

SaaS Software as a Service

SAFE Standard Archive Format for Europe

SIG Special Interest Groups

SLSTR Sea and Land Surface Temperature Radiometer

SRIA Strategic Research and Innovation Agenda

STIM Smart Transducer Interface Module (from IEEE standard)

SVM Support Vector Machines

SWG Standards Working Group

TCP Transmission Control Protocol

Page 12: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 12

TEP Thematic Exploitation Platform

TMS Tile Map Service

UDP Urban/City Data Platform

UI User Interface

UMM Unified Metadata Model

URL Universal Resource Locator

W3C World Wide Web Consortium

WCPS Web Coverage Processing Service

WCS Web Coverage Service

WFS Web Feature Service

WGISS Working Group on Information Systems and Services

WMS Web Map Service

WMTS Web Map Tile Service

WP Work Package

WPS Web Processing Service

WTZ Warning time horizon

XFDU XML Formatted Data Units

XML eXtensible Markup Language

Term Definition

Commercial

Mission

The products from high resolution and very high-resolution commercial

missions are purchased on the market. The term “commercial” is used

to denote both optical and radar missions.

Dataset Identifiable collection of data. In the EO Community, a dataset is

typically called “product”.

Dataset Series Collection of datasets sharing the same product specification. In the EO

Community, a dataset series is also called “collection” or “dataset” (in

GSCDA).

Exploitation

Platform

An Exploitation Platform is a virtual workspace, providing the user

community with access to (i) large volume of data (EO/non-space data),

(ii) algorithm development and integration environment, (iii) processing

software and services (e.g. toolboxes, retrieval baselines, visualization

routines), (iv) computing resources (e.g. hybrid cloud/grid), (v)

collaboration tools (e.g. forums, wiki, knowledge base, open

publications, social networking), (vi) general operation capabilities (e.g.

user management and access control, accounting, etc.).

SAFE Format The SAFE (Standard Archive Format for Europe) has been designed to

act as a common format for archiving and conveying data within ESA

Earth Observation archiving facilities.

Page 13: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 13

Special attention has been taken to ensure that SAFE conforms to the

ISO 14721:2003 OAIS (Open Archival Information System) reference

model and related standards such as the emerging CCSDS/ISO XFDU

(XML Formatted Data Units) packaging format.

Sentinel-1 The Copernicus Sentinel-1 earth observation mission developed by ESA

provides continuity of data from ERS and Envisat missions, with further

enhancements in terms of revisit, coverage, timeliness and reliability of

service. The SENTINEL-1 mission comprises a constellation of two polar-

orbiting satellites, operating day and night performing C-band synthetic

aperture radar imaging, enabling them to acquire imagery regardless of

the weather. The two-satellite constellation offers a 6 days revisit time.

A summary of mission objectives is:

● Monitoring sea ice zones and the Arctic environment, and

surveillance of marine environment;

● Monitoring land surface motion risks;

● Mapping of land surfaces: forest, water and soil;

● Mapping in support of humanitarian aid in crisis situations;

● Spatial Resolution: 5m, 20m, 40m.

Source: Wikipedia and Sentinel Online Web site

(https://sentinels.copernicus.eu).

Sentinel-2 The Copernicus Sentinel-2 earth observation mission developed by ESA

provides continuity to services relying on multi-spectral high-resolution

optical observations over global terrestrial surfaces. Sentinel-2 sustains

the operational supply of data for services such as forest monitoring,

land cover changes detection or natural disasters management.

The Sentinel-2 mission offers an unprecedented combination of the

following capabilities:

● Multi-spectral information with 13 bands in the visible, near

infra-red and short wave infra-red part of the spectrum;

● Systematic global coverage of land surfaces: from 56°South to

84°North, coastal waters and all Mediterranean Sea;

● High revisit: every 5 days at equator under the same viewing

conditions;

● High spatial resolution: 10m, 20m and 60m;

● Wide field of view: 290 km.

Source: Wikipedia and Sentinel Online Web site

(https://sentinels.copernicus.eu).

Sentinel-3 The Copernicus Sentinel-3 earth observation mission developed by ESA

main objective is to measure sea-surface topography, sea- and land-

surface temperature and ocean- and land-surface colour.

A pair of Sentinel-3 satellites will enable a short revisit time of less than

two days for OLCI instrument and less than one day for SLSTR at the

equator.

Page 14: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 14

Mission objectives are:

● Measure sea-surface topography, sea-surface height and

significant wave height;

● Measure ocean and land-surface temperature;

● Measure ocean and land-surface colour

● Monitor sea and land ice topography;

● Sea-water quality and pollution monitoring;

● Inland water monitoring, including rivers and lakes;

● Aid marine weather forecasting with acquired data;

● Climate monitoring and modelling;

● Land-use change monitoring;

● Forest cover mapping;

● Fire detection;

● Weather forecasting;

● Measuring Earth's thermal radiation for atmospheric

applications.

The Sentinel-3A mission has now reached the full operational capacity

and preparations for Sentinel-3B launch is-going (mission status on 6

December 2017).

Sources: Wikipedia and Sentinel Online Web site

(https://sentinels.copernicus.eu).

Third Party

Mission

ESA uses its multi-mission ground systems to acquire, process, archive

and distribute data from other satellites - so called Third Party Missions.

Source: http://earth.esa.int/missions/thirdpartymission/.

Page 15: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 15

Introduction 1.1 Project Summary The data intensive target sector selected for the

DataBio project is the Data-Driven Bioeconomy.

DataBio focuses on utilizing Big Data to

contribute to the production of the best possible

raw materials from agriculture, forestry and

fishery/aquaculture for the bioeconomy

industry, in order to output food, energy and

biomaterials, also taking into account various

responsibility and sustainability issues.

DataBio will deploy state-of-the-art big data technologies and existing partners’ infrastructure

and solutions, linked together through the DataBio Platform. These will aggregate Big Data

from the three identified sectors (agriculture, forestry and fishery), intelligently process them

and allow the three sectors to selectively utilize numerous platform components, according

to their requirements. The execution will be through continuous cooperation of end user and

technology provider companies, bioeconomy and technology research institutes, and

stakeholders from the big data value PPP programme.

DataBio is driven by the development, use and evaluation of a large number of pilots in the 3

identified sectors, where also associated partners and additional stakeholders are involved.

The selected pilot concepts will be transformed to pilot implementations utilizing co-

innovative methods and tools. The pilots select and utilize the best suitable market ready or

almost market ready ICT, Big Data and Earth Observation methods, technologies, tools and

services to be integrated to the common DataBio Platform.

Based on the pilot results and the new DataBio Platform, new solutions and new business

opportunities are expected to emerge. DataBio will organize a series of trainings and

hackathons to support its take-up and to enable developers outside the consortium to design

and develop new tools, services and applications based on and for the DataBio Platform.

The DataBio consortium is listed in Table 1. For more information about the project see

www.databio.eu.

Table 1: The DataBio consortium partners

Number Name Short name Country

1 (CO) INTRASOFT INTERNATIONAL SA INTRASOFT Belgium

2 LESPROJEKT SLUZBY SRO LESPRO Czech Republic

3 ZAPADOCESKA UNIVERZITA V PLZNI UWB Czech Republic

4

FRAUNHOFER GESELLSCHAFT ZUR FOERDERUNG DER

ANGEWANDTEN FORSCHUNG E.V. Fraunhofer Germany

5 ATOS SPAIN SA ATOS Spain

Page 16: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 16

6 STIFTELSEN SINTEF SINTEF ICT Norway

7 SPACEBEL SA SPACEBEL Belgium

8

VLAAMSE INSTELLING VOOR TECHNOLOGISCH

ONDERZOEK N.V. VITO Belgium

9

INSTYTUT CHEMII BIOORGANICZNEJ POLSKIEJ

AKADEMII NAUK PSNC Poland

10 CIAOTECH Srl CiaoT Italy

11 EMPRESA DE TRANSFORMACION AGRARIA SA TRAGSA Spain

12 INSTITUT FUR ANGEWANDTE INFORMATIK (INFAI) EV INFAI Germany

13 NEUROPUBLIC AE PLIROFORIKIS & EPIKOINONION NP Greece

14

Ústav pro hospodářskou úpravu lesů Brandýs nad

Labem UHUL FMI Czech Republic

15 INNOVATION ENGINEERING SRL InnoE Italy

16 Teknologian tutkimuskeskus VTT Oy VTT Finland

17 SINTEF FISKERI OG HAVBRUK AS

SINTEF

Fishery Norway

18 SUOMEN METSAKESKUS-FINLANDS SKOGSCENTRAL METSAK Finland

19 IBM ISRAEL - SCIENCE AND TECHNOLOGY LTD IBM Israel

20 MHG SYSTEMS OY - MHGS MHGS Finland

21 NB ADVIES BV NB Advies Netherlands

22

CONSIGLIO PER LA RICERCA IN AGRICOLTURA E

L'ANALISI DELL'ECONOMIA AGRARIA CREA Italy

23 FUNDACION AZTI - AZTI FUNDAZIOA AZTI Spain

24 KINGS BAY AS KingsBay Norway

25 EROS AS Eros Norway

26 ERVIK & SAEVIK AS ESAS Norway

27 LIEGRUPPEN FISKERI AS LiegFi Norway

28 E-GEOS SPA e-geos Italy

29 DANMARKS TEKNISKE UNIVERSITET DTU Denmark

30 FEDERUNACOMA SRL UNIPERSONALE Federu Italy

31

CSEM CENTRE SUISSE D'ELECTRONIQUE ET DE

MICROTECHNIQUE SA - RECHERCHE ET

DEVELOPPEMENT CSEM Switzerland

32 UNIVERSITAET ST. GALLEN UStG Switzerland

33 NORGES SILDESALGSLAG SA Sildes Norway

34 EXUS SOFTWARE LTD EXUS

United

Kingdom

35 CYBERNETICA AS CYBER Estonia

36

GAIA EPICHEIREIN ANONYMI ETAIREIA PSIFIAKON

YPIRESION GAIA Greece

37 SOFTEAM Softeam France

38

FUNDACION CITOLIVA, CENTRO DE INNOVACION Y

TECNOLOGIA DEL OLIVAR Y DEL ACEITE CITOLIVA Spain

39 TERRASIGNA SRL TerraS Romania

Page 17: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 17

40

ETHNIKO KENTRO EREVNAS KAI TECHNOLOGIKIS

ANAPTYXIS CERTH Greece

41

METEOROLOGICAL AND ENVIRONMENTAL EARTH

OBSERVATION SRL MEEO Italy

42 ECHEBASTAR FLEET SOCIEDAD LIMITADA ECHEBF Spain

43 NOVAMONT SPA Novam Italy

44 SENOP OY Senop Finland

45

UNIVERSIDAD DEL PAIS VASCO/ EUSKAL HERRIKO

UNIBERTSITATEA EHU/UPV Spain

46

OPEN GEOSPATIAL CONSORTIUM (EUROPE) LIMITED

LBG OGCE

United

Kingdom

47 ZETOR TRACTORS AS ZETOR Czech Republic

48

COOPERATIVA AGRICOLA CESENATE SOCIETA

COOPERATIVA AGRICOLA CAC Italy

49 SINTEF AS SINTEF Norway

1.2 Document Scope

The main objective of this deliverable is to describe the datasets utilized, improved and

created in the DataBio project. A secondary objective is to show how the datasets are

identified based on a model-driven design process based on Archimate, involving the 26 pilot

systems in the DataBio project.

In addition to this deliverable, the datasets will be provided through the DataBioHub,

including important Archimate design diagrams.

1.3 Document Structure

This document is comprised of the following chapters:

Chapter 1 presents an introduction to the project and the document.

Chapter 2 introduces datasharing and dataeconomy in the context of DataBio.

Chapter 3 presents the context view of datasets in DataBio, including external drivers,

stakeholders and license models.

Chapter 4 provides an overview of the requirements for datasets and datastreams in DataBio

grouped by pilots and the platform itself

This is a public version of Deliverable D4.3 “Data sets, formats and models”.

Confidential information from the original document has been omitted.

Page 18: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 18

Chapter 5 presents the datasets in DataBio: existing, improved, new and other relevant

datasets. The final subsection gives an example of how a dataset can be used for application

development.

Chapter 6 presents the concluding remarks.

Page 19: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 19

Background 2.1 Data sharing and data economy in DataBio As part of the Digital Single Market strategy and building a European data economy, the

European Commission adopted the Communication ‘Towards a common European data

space’ in April 2018 [REF-01]. The document proposes a roadmap to “a common data space

in the EU - a seamless digital area with the scale that will enable the development of new

products and services based on data.” The DataBio domains, agriculture, forestry and fishery,

are key areas where the Commission expects that businesses can utilize the data sharing

through the data space to improve products and productivity. The Commission document

identifies reuse of public and publicly funded data to be a cornerstone in the dataspace and

has launched the “European Open Data Portal” to stimulate the development [REF-02].

An important factor in realizing a common data space is to stimulate to private businesses

and public agencies to share both private and public datasets. The guide to “Building a

European data economy” states that digital data “is an essential resource for economic

growth, competitiveness, innovation, job creation and societal progress in general” [REF-03].

Digital data should be shared in both business to business (B2B) and business to government

(B2G) contexts. The DataBio pilots involves many private stakeholders that produce, consume

and share datasets/datastreams. The pilots will demonstrate how data can be shared and

utilized in order to improve the quality and efficiency of pilot systems. All datasets and

datastreams involved in the pilot systems’ realization are identified documented in the

platform and pilot ArchiMate models. These models relate the datasets to the pilots and

interfaces, providing traceability from pilot to data, components and pipelines.

The DataBio datasets and datastreams are examples of B2B and B2G data sharing, and is

documented here in terms of

1) Rich metadata: each dataset is described with relevant metadata elements following

best practice and harmonized with e.g. Transforming Transport datasets [REF-04].

2) Data portal - the DataBioHub: each dataset and datastream is registered in the

DataBioHub - a data portal from DataBio

3) Examples: relevant examples on how to utilize datasets from DataBio is provided in

this document

2.2 FAIR Principles Most datasets from publicly funded research are still inaccessible to the majority of scientists

in the same discipline, not to mention other potential users of the data, such as company R&D

departments. About 80% of research data is not in a trusted repository. However, even if the

data openly appears in repositories, this is not always enough. As a current example, only 18%

of the data in open repositories is reusable [REF-05]. This leads to inefficiencies and delays; in

recent surveys, the time reportedly spent by data scientists in collecting and cleaning data

sources made up 80% of their work [REF-06]. These figures can be assumed to be valid also

for the bioeconomy sector.

Page 20: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 20

In response to these challenges, the Commission has launched a large effort with the

objective of creating “a European Open Science Cloud to make science more efficient and

productive and let millions of researchers share and analyse research data in a trusted

environment across technologies, disciplines and borders” [REF-07]. The initial outline for the

European Open Science Cloud (EOSC) was laid out in the report from the High Level Expert

Group (Moons et al 2016). This report promotes the FAIR Data Principles, which are a set of

guiding principles in order to promote maximum use of research data (Wilkinson et al., 2016)

The FAIR principles were created in a workshop in 2014 and intend to give “a minimal set of

community-agreed guiding principles and practices” [REF-08]. Both humans and machines

should be enabled to find (F), access (A), interoperate (I) and re-use (R) research data and

metadata in an effortless but confined fashion. These principles provide guidance for

scientific data management and stewardship and are relevant to all stakeholders in the

current digital ecosystem. A Data management plan based on FAIR is since 2017 mandatory

in all EU Horizon projects [REF-09]. The FAIR principles are advanced by the Go Fair initiative

(https://www.go-fair.org/) [REF-10]. Currently, Germany, France and the Netherlands are

part of this initiative.

As comes to DataBio, the project implemented the Data Management Plan (DMP), that is a

part of the project proposal. The plan, that constitutes Deliverable D6.21 covers descriptions

of the DataBio datasets, data standards, data sharing and long-time preservation of data. The

DMP is also an important tool for the dissemination and exploitation activities. Data privacy

and ownership are essential elements, which are dealt with in T4.6.

The DataBioHub [REF-11], described below in Section 2.4.1, is a central tool for our project in

realising data management and data sharing. In addition to offering searchable public and

private dataset descriptions, it also contains descriptions of DataBio components, pipelines

and pilots as well as of their mutual relations. The hub clearly makes the DataBio data findable

by publishing the metadata according to best practices and standards (geospatial and others)

as well as applying search keywords (=tags) to the digital objects. The data is also accessible

from the DataBioHub repository, however in some cases only indirectly by consulting the

dataset owner, when the Hub only contains the metadata. DataBioHub typically contains

information about the APIs, the data model and formats as well as about the access methods

This hub also promotes interoperability as the metadata and data many times - but not always

- obey established standards, e.g. in the Earth Observation field. Finally, for reusability, the

licensing schemes are essential to permit the widest reuse possible. When will restricted data

be made available for reuse? Are the data produced and/or used in the project useable by

third parties, in particular after the end of the project? How long is it intended that the data

remains re-usable?

The DataBio data management plan related to FAIR principles is described in chapter 3 of

[REF-20].

1 https://www.databio.eu/wp-content/uploads/2017/05/DataBio_D6.2-Data-Management-Plan_v1.0_2017-06-

30_CREA.pdf

Page 21: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 21

2.3 Metadata and discovery of datasets Data discoverability of (open) geo-information is vital to increase the use of geospatial data

within- and outside the geospatial expert community. This may also be supported by

experience originating from Europe. In 2003, Directive 2003/98/EC (also known as PSI – Public

Sector Information) established a minimum set of rules governing both the re-use and the

practical means of facilitating the re-use of existing documents held by public sector bodies

in the European Union. In the end, Directive 2003/98/EC had only a partial impact in the field

of data re-use. It was even hard to discover that there are data that may be re-used. In 2007,

Directive 2007/2/ES (also known as INSPIRE – INfrastructure for SPatial InfoRmation in

Europe) was established, chiefly to make it easier to discover available spatial data and

services. Moreover, discovery mechanisms represent one of the bridges between geospatial

and non-geospatial approaches for metadata management.

Metadata in the DataBio project are extraordinarily diverse from their structure, encodings,

kinds of resources they describe, handling as well as publication point of views. “Big

metadata” approaches need to be developed since also metadata meet the requirements of

three out of four V: variety, veracity and velocity. Volume is not an issue as metadata are

typically small, in a scale of kilo- or maximally megabytes. Nevertheless, the traditional

metadata approaches are based on assumptions of static resources and long-term durability

of metadata records from a variety and velocity point of view. Veracity of metadata has

always been an issue, a least, due to a loose integration of data and metadata updates. The

DataBio approach therefore aims at the following goals for metadata and discovery:

1. Tight data and metadata together: ensure updated metadata despite Big Data velocity

updates.

2. Support metadata heterogeneity: enable discovery of static (e.g. datasets) as well as

mobile/other resources (e.g., sensors active during agricultural machinery fleet

tracking) in a unified platform.

3. Use efficient encodings: support XML-based format for backwards compatibility, on

the contrary use visionary lightweight and semantics-based formats.

4. Integrate metadata in other tools: the best metadata platform is the one where a user

does not notice that (s)he works with metadata.

2.4 Data registries, data sharing and data exchange The data sets of DataBio are registered in the DataBio Hub. It is also relevant to register

datasets in other data registries like GEOSS or others.

Earth Observation (EO) data sets are of major importance for the DataBio project, and the

management and access of these has been described in more detail in the deliverables D5.1

and D5.i2. As an example, Sentinel Products available on the Sentinels Scientific Data Hub

(Sentinel-1, Sentinel-2) can be discovered and accessed via the FedEO Gateway (C07.01) that

returns Sentinel collections and datasets metadata (including product download URL) via an

OGC 13-026r8 OpenSearch interface.

Page 22: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 22

Industrial data platforms including support for data sharing, data exchange and data access

are now also emerging and the DataBio project is also aiming to take advantage of these in

the next phase of the project. Below the DataBioHub is described, followed by a description

of other relevant data registries and data platforms.

2.4.1 DataBioHub

DataBioHub [REF-11] provides a registry for the project components, pipelines, and pilots for

an easy search of the different project entities. The hub is dynamic and is being updated with

more functionalities and resources as the project evolves. The data sets applied in the project

will be added to this hub, so these can be searched in combinations with the other project

resources.

It is important to note that the DataBio Hub does not offer a repository or operating

environment for the service instances and datasets themselves, as those instances will be

running on the service providers’ servers or cloud infrastructure (or DataBio -provided cloud).

Regardless of the running environment of the service instances, DataBioHub offers

descriptions and endpoints to all DataBio platform -compatible services and components (and

possibly applications) in a single location and with a coherent description.

Initially two publicly available instances of the complete digital service registries exist: one as

a project deliverable at icare.erve.vtt.fi/ServiceRegistryWeb and one public and free for non-

commercial R&D usage at www.digitalserviceshub.com.

A new service registry instance has been provided for DataBio project and can be found at

http://www.databiohub.eu/. The instance has been installed on a virtual machine on

Microsoft Azure’s cloud computing service. Infrastructure as a service (IaaS) allows easy

server management and increasing computing power and resources if needed. Virtual

machine runs on Ubuntu Linux platform and the whole machine is backed up in a recovery

service vault redundant geologically.

Digital service registry has been tailored for DataBio use, which includes following

developments:

• As service registry was initially developed to register digital services with mainly machine-readable interface descriptions, vocabulary support for new categories of software components such as applications with both human and technical interfaces have need to be added.

• New interface technologies such as OpenSearch for satellite image services have been added to service hub interface description vocabularies.

• DataBio Pilot descriptions data processing pipelines developed in pilots as well as component descriptions are now also included into registry. Dataset descriptions will be added while submitting deliverable D4.3.

• Specific rules for keyword use for DataBio have been enforced to link descriptions to BDVA reference architecture and also help linking component and service descriptions to overall architecture of DataBio platform and pilots.

Page 23: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 23

• New fields for human readable description have been added to improve linking them to pilot development, data models, and DataBio deliverables with possibility to include as images the component diagrams exported from DataBio architecture models.

• Service Hub UI and its website has been tailored for DataBio and linked with other websites of DataBio project.

• Registration mechanisms for new users outside DataBio consortium have been restricted during DataBio platform development.

2.4.2 Linked Data and Open Micka

The best practices for the publication of Linked data were described in previous deliverable

D4.i1, Section “Linked Data Publication Pipeline”. In this section, we summarize the most

relevant practices, which have been applied during the DataBio project.

Theoretically Linked Data refers to a set of best practices for publishing and interlinking

structured data thereby enabling it to be accessed by both humans and machines. The data

interchange follows the RDF family of standard and SPARQL is used for querying. The key

technologies that support the Linked Data are:

• Any concept or entity can be identified by assigning specific URIs to them.

• HTTP for retrieving or description of resources.

• RDF which is generic graph-based data model used for structuring and linking data that describes concepts or entities in the real world.

• SPARQL is the standard RDF query language.

Due to the growing popularity of Linked Data, more detailed guidelines for the development and delivery of open data as Linked Data were defined. For instance, for open government data, the best practices recommended include (more detailed information was given in D4i.1:

• To prepare the stakeholders

• To select a reusable dataset

• To model data objects and their relations to represent Linked Data.

• To specify an appropriate license to ease data reuse.

• To use well-considered URI naming strategy and implementation plan.

• To describe the objects with previously defined vocabulary.

• To convert data into linked data representation by scripting or other automated processes.

• To provide machine access to the Linked Data.

• To announce new datasets on authoritative domains to initiate an implicit social contact.

• To maintain the Linked Data which is once published.

Note that even those these best practices were conceived for open government data, they

apply generally in other domains.

Regarding the publication process, there are at least three well known life cycle models

(Hyland et al., Hausenblas et al., Villazón-Terrazas et al.) for publishing linked data. All of these

Page 24: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 24

models identify common needs of specifying, modelling and publishing data in the standard

open Web format. However even though all of the models somewhat deal with similar tasks

involved in the process of publishing linked data, they have some differences between those

tasks. A detailed description of these models is available in D4.i1. For our work, we are mainly

interested in the model proposed by Villazón-Terrazas et al., that includes the following

activities:

• Specification:

o Identification and analysis of the data sources to be published.

o Reusing or leveraging the data that had already been opened/published.

o Assigning meaningful URIs rather than opaque ones whenever possible.

o Definition of the license of the data sources and reusing existing ones

whenever possible.

• Modelling:

o Ontologies are to be expressed in either OWL or RDF(S).

o Reusing the existing and available vocabularies.

o Reusing the available non-ontological resources.

• Generation:

o Transformation of the specified data sources into RDF according to the

modelled vocabulary by using tools like CSV and spread sheets, RDB or XML.

o Pre-processing and/or post processing tasks for fixing accessibility issues,

reasoning issues etc.

o Linking with suitable datasets and discovering suitable relationships between

the other data items with valid properties.

• Publishing:

o Dataset publication by using tools for storing RDF (e.g. Openlink Virtuoso

Universal Server, Jena, Sesame, 4Store, YARS, OWLIM etc.) and using SPARQL

endpoint and Linked Data front end (e.g. Pubby, Talis Platform, Fuseki).

o Metadata publication by using VoID which allows expressing metadata about

RDF datasets and by OPM (Open Provenance Model).

o Dataset discovery by registering the datasets in the CKAN2 registry and

generating sitemap files for the dataset, by using sitemap4rdf.

• Exploitation

o Application and exploitation of the Linked Data for various purposes and

applications across different platform in Web technology.

Open Micka [REF-14] is a web application for management and discovery geospatial metadata

(open source under BSD license). This has been extended and applied in DataBio project in

particular in the Agriculture pilot 1 pipeline on " Metadata, linked data and graph data ".

Features of the application:

• OGC Catalogue service (CSW 2.0.2)

2 https://ckan.org/

Page 25: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 25

• Transactions and harvesting

• Metadata editor

• Multilingual user interface

• ISO AP 1.0 profile

• Feature catalogue (ISO 19110)

• Interactive metadata profiles - management

• WFS/Gazetteer for defining metadata - extent

• GEMET thesaurus built-in client

• INSPIRE registry built-in client

• OpenSearch

• INSPIRE ATOM download service - automatically creation from metadata

2.4.3 Industrial data spaces

The Industrial Data Space (IDS) (renamed in April 2018 to International Data Space(s) ) is both

a research project and a non-profit user association (IDSA). IDS extends a Data marketplace

with the ability to run services inside the IDS, e.g., data analysis and processing operations.

The core requirements for and IDS related to data access are as described in the Industrial

Data Space whitepaper.3

• Data sovereignty: It is always the data owner that specifies the terms and conditions of use of the data provided

• Decentral data management: Data management remains with the respective data owner, if desired.

• Data economy: Data is viewed as an economic asset. It can be distinguished into three categories: private data, so-called »club data« (i.e. data belonging to a specific value creation chain, which is available to selected companies only), and public data (weather information, traffic information, geo data etc.).

• Easy linkage of data: Linked-data concepts and common vocabularies facilitate the integration of data between participants.

• Trust: All participants, data sources, and data services of the Industrial Data Space are certified against commonly defined rules.

• Secure data supply chain: Data exchange is secure across the entire data supply chain, i.e. from data creation to data capture to data usage.

A “Data User” that wants to access data in an IDS must comply with a set of requirements

specified by the “Data Provider” and IDS. These requirements may include payment,

standards for data protection, use period, restrictions on aggregation levels and sharing with

other parties.

3 http://www.industrialdataspace.org/wp-content/uploads/2016/09/whitepaper-industrial-data-space-eng.pdf

Page 26: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 26

An example of an IDS solution is the Estonian Cybernetica platform4. Cybernetica provides

solutions for sea surveillance, customs declaration management, data sharing, voting and a

number of other applications.

2.4.4 Openness and payment

Openness with respect to data is not a binary concept and that there could be degrees of

openness when it comes to data access (eligible parties, conditions under which data can be

accessed). With the diffusion of IoT-enabled sensors/machines, storage and payment of data

has adopted blockchain technologies. In addition to secure storage, this approach allows data

consumer services that can purchase data from providers using blockchain payment. Datum

(https://datum.org) is an example of a data marketplace following this approach as illustrated

below.

Figure 2: How distributed storage and payments work

2.4.5 UXP – Exchange Platform - Cybernetica

Unified eXchange Platform (UXP) is a technology that enables peer-to-peer data exchange

over encrypted and mutually authenticated channels. It is based on a decentralised

architecture where each peer has an information system that will be connected with other

peers’ systems.

4 https://cyber.ee/en/

Page 27: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 27

UXP is created by the authors of the world-renowned e-Government system of Estonia, the

X-Road, which according to the World Bank Development Report is what allowed Estonia to

become a truly digital society.

UXP-based solutions have been implemented across four continents to enable running online

government services for 35 million people from different countries and cultures. We make

this possible by fitting our technology naturally into your existing ecosystem, with full

integration support and minimal changes required.

Seamless Data Exchange: UXP connects any number of databases in an efficient and secure

way, helping you build a network of agreements that allows controlled exchange between

any members in your ecosystem.

UXP benefits:

• Less is More. UXP means less paperwork, less bureaucracy, less time spent on futility. In Estonia, digital services save every citizen one work-week per year. What would you do with your week?

• Affordable. UXP can be implemented into any ecosystem – be it a tiny country or a supranational association. With very low maintenance cost and marginal implementation investment, UXP is cost-effective and allows you to move ahead one step at a time.

• Reliable. UXP has been heavily tried and tested since its launch as Estonia’s X-Road in 2001. No downtime has been observed since and the system survived the world’s first cyber conflict in 2007.

• Secure. We use extensive security measures to guarantee the protection and integrity of your data. UXP is secure-by-design, as its decentralised architecture has no single point-of-failure. All traffic is encrypted with 2048-bit keys. These are minimal requirements of the system – cryptographic algorithms can be altered to provide even stronger encryption at the request of our customers.

• Scalable. UXP is scalable to any size of infrastructure. Unlimited amount of security servers can be linked together, making it fit for local and international applications.

• Private. We use a distributed architecture, eliminating the creation of a superdatabase, which could be prone to exploitation. All transactions are signed and timestamped, making it possible to monitor all queries made by officials against private citizens

2.5 Industrial data spaces and connectors The “Industrial Data Space” is a virtual data space using standards and common governance

models to facilitate the secure exchange and easy linkage of data in business ecosystems. It

thereby provides a basis for creating and using smart services and innovative business

processes, while at the same time ensuring digital sovereignty of data owners.

The following section introduces the concept of the Industrial Data Space Connector by citing

from the Reference Architecture Model for the Industrial Data Space published by the

Page 28: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 28

Fraunhofer-Gesellschaft in cooperation with the Industrial Data Space Association. We

introduce Connectors on the functional level5.

Figure 3 shows the Functional Architecture of the Industrial Data Space. It defines,

irrespective of existing technologies and applications, the functional requirements of the

Industrial Data Space, and the features to be implemented resulting thereof.

Figure 3: Functional Architecture of the Industrial Data Space

The Connector is the central functional entity of the Industrial Data Space. It facilitates the

exchange of data between participants. The Connector is basically a dedicated

communication server for sending and receiving data in compliance with the Connector

specification (see Section 3.5.1 in the Reference Architecture Model). A single Connector can

be understood as a node in the peer-to-peer architecture of the Industrial Data Space. This

means that a central authority for data management is not required. Connectors can be

installed, managed and maintained both by Data Providers and Data Consumers. Typically, a

Connector is operated in a secure environment (e.g., beyond a firewall). This means that

internal systems of an enterprise cannot be directly accessed. However, the Connector can,

for example, also be connected to a machine or a transportation vehicle. Each company

participating in the Industrial Data Space may operate several Connectors. As an option,

intermediaries (i.e., the Service Provider) may operate Connectors on behalf of one or several

participating organizations. The data exchange with the enterprise systems must be

established by the Data Provider or the Data Consumer.

Data Providers can offer data to other participants of the Industrial Data Space. The data

therefore has to be described by metadata. The metadata contains information about the

Data Provider, syntax and semantics of the data itself, and additional information (e.g., pricing

information or usage policies). To support the creation of metadata and the enrichment of

data with semantics, vocabularies can be created and stored for other participants in the

Vocabulary and Metadata Management component. If the Data Provider wants to offer data,

5 For further details related to the other layers of the Reference Architecture Model please refer to the official

document: https://www.fraunhofer.de/content/dam/zv/de/Forschungsfelder/industrial-data-space/Industrial-

Data-Space_Reference-Architecture-Model-2017.pdf

Page 29: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 29

the metadata will automatically be sent to one or more central metadata repositories hosted

by the Broker. Other participants can browse and search data in this repository. Connectors

can be extended with software components that help transform and/or process data. These

Data Apps constitute the App Ecosystem. Data Apps can either be purchased via the App Store

or developed by the participants themselves. App Providers may implement and provide Data

Apps using the AppStore. Every participant possesses identities required for authentication

when communicating with other participants. These identities are managed by the Identity

Management component. The Clearing House logs each data exchange between two

Connectors.

2.5.1 EU Data Portal

The European Union Open Data Portal (EU ODP) [REF-02] gives access to open data published

by EU institutions and bodies. All the data via this catalogue are free to use and reuse for

commercial or non-commercial purposes. They can be reused in databases, reports, or

projects. A variety of digital formats are available from the EU institutions and other EU

bodies. Total datasets available as per the July 2018 is 12418.

The goal by providing easy access to data — free of charge — is to help organizations to use

the data in innovative ways and unlock their economic potential. The portal is also designed

to make the EU institutions and other bodies more open and accountable.

The data concerned include: geographic, geopolitical and financial data; statistics; election

results; legal acts; data on crime, health, the environment, transport and scientific research.

The portal provides:

• a standardised catalogue, giving easier access to EU open data;

• a list of apps and web tools reusing these data;

• a SPARQL endpoint query editor;

• REST API access;

• tips on how to make best use of the site (see the Search and SPARQL manuals).

2.5.2 GEOSS

The Group on Earth Observations (GEO) [REF-12]works to connect the demand for sound and

timely environmental information with the supply of data and information about the Earth

that is collected through observing systems and made available by the GEO community.

GEOSS (Global Earth Observation System of systems) is a set of coordinated, independent

Earth Observation, information and processing systems that interact and provide access to

diverse information for a broad range of users in both public and private sectors. It facilitates

the sharing of environmental data and information collected from the large array of observing

systems contributed by countries and organizations within GEO.

The ‘GEOSS Portal’ offers a single Internet access point for users seeking data, imagery and

analytical software packages relevant to all parts of the globe. It connects users to existing

Page 30: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 30

databases and portals and provides reliable, up-to-date and user-friendly information – vital

for the work of decision makers, planners and emergency managers.

It is an objective that DataBio datasets suitable for GEOSS will be added to the GEOSS portal.

2.5.3 DCAT and GeoDCAT

GeoDCAT is a Geospatial extension to DCAT-AP (DCAT application profile for data portals in

Europe). DCAT-AP is a metadata profile meant to provide an interchange format for data

portals operated by EU Member States. It is based on and compliant with the W3C Data

Catalog (DCAT) vocabulary. Data Catalog Vocabulary (DCAT) is an RDF vocabulary designed to

facilitate interoperability between data catalogues published on the Web. By using DCAT to

describe datasets in catalogues, publishers increase discoverability and enable applications to

consume metadata from multiple catalogues. It enables decentralized publishing of

catalogues and facilitates federated dataset search across them.

GeoDCAT was developed in the framework of the EU Programme “Interoperability Solutions

for European Public Administrations” (ISA). GeoDCAT-AP is meant to provide a DCAT-AP

compliant representation for the set of metadata elements included in INSPIRE metadata and

the core profile of ISO 19115:2003. GeoDCAT objectives:

• The GeoDCAT-AP specification does not replace the INSPIRE Metadata Regulation nor the INSPIRE Metadata Technical Guidelines based on ISO 19115:2003 and ISO 19119 [REF-13]

• Its purpose is to give owners of geospatial metadata the possibility to achieve more by providing an additional RDF syntax binding

• Its basic use case is to make spatial datasets, data series, and services searchable on general data portals, thereby making geospatial information better searchable across borders and sectors

2.5.4 CKAN

CKAN [REF-15] is one of the world’s leading open source data portal platform. It is a data

management system that makes data accessible by providing tools to streamline publishing,

sharing, finding and using data. CKAN is aimed at data publishers (national and regional

governments, companies and organizations) wanting to make their data open and available.

Once the data is published, users can use its faceted search features to browse and find the

data they need, and preview it using maps, graphs and tables – whether they are developers,

journalists, researchers, NGOs, or citizens.

2.6 Others OpenAire (https://www.openaire.eu/) as a place to put open data and get a Digital Object

Identifier (DOI) for the dataset. EU funded projects are expected to add the open datasets

created to this portal, and this is also the intention of DataBio. An example of an OpenAire

dataset is shown in the Figure below.

Page 31: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 31

Figure 4: OpenAire

Dryad (https://datadryad.org), the Dryad Digital Repository, is a curated resource that makes

the data underlying scientific publications discoverable, freely reusable, and citable. Dryad

provides a general-purpose home for a wide diversity of datatypes.

Dryad’s vision is to promote a world where research data is openly available, integrated with

the scholarly literature, and routinely re-used to create knowledge.

The Dryad mission is to provide the infrastructure for, and promote the re-use of, data

underlying the scholarly literature.

Dryad is governed by a non-profit membership organization. Membership is open to any

stakeholder organization, including but not limited to journals, scientific societies, publishers,

research institutions, libraries, and funding organizations.

Publishers are encouraged to facilitate data archiving by coordinating the submission of

manuscripts with submission of data to Dryad. Learn more about submission integration.

Dryad originated from an initiative among a group of leading journals and scientific societies

in evolutionary biology and ecology to adopt a joint data archiving policy (JDAP) for their

publications, and the recognition that easy-to-use, sustainable, community-governed data

infrastructure was needed to support such a policy. An example from Dryad is shown in Figure

5.

Page 32: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 32

Figure 5: DRYAD

Page 33: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 33

Context view The datasets, formats and models are identified, described and used within the context of the

DataBio project.

3.1 External drivers for data sharing and data exchange Data sharing consists of minimum two stakeholder which are providing and/or consuming

mostly structured data about an entity (person, business, property or event). External

regulations may set the rules and conditions of data sharing - on how to provide or consume

data. These conditions might have a conductive or restrictive impact on data sharing

processes. Most prominent regulation (legislation) is the GDPR, which set the conditions of

processing personal data. Personal data are any information relating to an identified or

identifiable natural person. For example: To process personal data, the purpose of processing

has to be defined (and validated) and the data consumer has to make sure that the data are

only processed for the defined purpose. In business context, not only legislations are setting

the rules, moreover all contracts define the specific rulesets for processing data.

Such regulations and rules have two main impacts. Firstly, to enable data sharing, the

infrastructure (software) must ensure the compliance to external requirements and rules,

such as the GDPR. Secondly, the data sharing process need to be defined and specified

according to those regulations. While both process in the first run implies costs and efforts,

in the second run it enables trust and long-term collaboration within a community such as

bioeconomy. Furthermore, regulations and activities of public bodies can enable trustful

environment of data-sharing, such as Open Data Policies.

Beside the regulations and rules data sharing also depending on the knowledge domain,

application scenario and intended use, data is differently represented, stored and published.

Data may be intended for human users or for machine processing. Data can be in very diverse

formats and their multimodality (text, image, video, audio) as well as its structural level

(unstructured, semi-structured and structured) can be geared to a specific purpose. Both have

impact on data providing and consuming processes. Furthermore, data, datasets, knowledge

bases and knowledge building blocks are often not stable, are successively expanded and

versioned as well as increasingly developed collaboratively and decentralized. Depending on

the stability and size of the datasets, data is materialized or computed by processing routines

and made available via APIs. Insofar as data is made accessible, for example under Open Data

principles, the target group of the data users must be identified, possible business models

defined, license requirements provided or used, the provenance and trustworthiness of the

data disclosed.

Data that is untrustworthy and whose usability is in question are hardly unusable, at least in

a professional environment. This heterogeneity presents data publishers / data owners and

data users with major problems. Various initiatives (W3C, Go-FAIR, DCMI) recommend the

use of metadata that are specially designed. These initiatives are important external drivers

that have impact on data sharing in data economy and especially in bioeconomy. The more

clarity of the process and requirements of data sharing, the more users will succeed.

Page 34: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 34

As a rough measure of the quality and sustainability of the published data, the 5-star scheme

according to Tim Berners Lee can be used [REF-16]:

★ Make your data available on the web under an open license. The format does not matter

★★ Provide data in a structured format (e.g. Excel instead of a scanned image of a

spreadsheet)

★★★ Use open, non-proprietary formats (e.g. CSV instead of Excel)

★★★★ Use URIs to label things so your data can be linked

★★★★★ Link your data with other data to create contexts

For example, the W3C offers a huge set of recommendations on which formats, languages,

and vocabularies used to design and link data as well as metadata (RDF, RDFS, OWL, SPARQL,

SHACL). Furthermore, the W3C offers a best practice for dealing with data to be published

(https://www.w3.org/TR/dwbp/). For example, to provide metadata for both human users

and computer applications, and describes the overall features of the dataset as well as the

schema and internal structure of the distributions.

Further, as described in Section 2.2, the Go-Fair Initiative [REF-17] developed a structured

guideline to publish data sustainable. It uses four categories: Firstly, “To be Findable”, which

mainly set some recommendations of identifier and “rich metadata”. Secondly, “To be

Accessible”, which refer to the usage of standards and well-designed protocols. Thirdly, “To

be Interoperable”, which are guidelines to ensure quality and transparent representations

and fourthly “To be Re-usable”, which makes sure the data can be accessed and provided

sustainably.

In addition to these guides, the Dublin Core MetaData Initiative [REF-18] offers a variety of

vocabularies in different formats for describing metadata related to raw data and data

aggregates. Particular emphasis is placed on the provision of:

• Authors and contributors

• Description of the data in text

• Categorization

• License information,

• Versioning and updating rules.

How concrete license information has to be designed is currently not defined and part of

different research approaches. One structured definition of a license can be found on [REF-

19].

Due to the domain-specific complexity and heterogeneity of the data representation, there is

no one big truth that leads the data economy of an application scenario to success. Rather,

this is seen as a collection of recommendations that address dedicated aspects of the design

Page 35: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 35

of data to ensure the sustainable usability of the data provided, and thus providing users with

the greatest possible support.

3.2 Data interoperability through ontologies, models, formats and

standards DataBio aims at supporting data interoperability through use of suitable standard ontologies,

and data models in the general domains of Geospatial and Earth Observation data and in the

specific domains of Agriculture, Forestry and Fishery, and also to impact future

standardisation where this is found feasible.

3.2.1 Geospatial and Earth Observation ontologies and standards

In the Geospatial domain the DataBio project will aim to use and extend the standards of

OGC, ISO/TC211 and INSPIRE in particular related to the requirements of Big Data. In the

Earth Observation domain, the objective is to use and extend the international Earth

Observation standards and services/APIs as described further in D5.1.

3.2.2 Agricultural ontologies and standards

In the spring of 2018 the DataBio project has engaged in the new established Agriculture

Working Group of OGC, the Open Geodata Consortium.

The mission of the OGC Agriculture Working Group is to identify geospatial interoperability

issues and challenges within the agriculture domain, then examine ways in which those

challenges can be met through application of existing OGC standards, or through

development of new geospatial interoperability standards under the auspices of OGC.

• Examination of the possibilities for agricultural information exchange standard alignment and harmonization between UN/CEFACT, ISO TC 23, ISOBus, AgroXML, OGC, W3C, etc.

• Development of a reference architecture for use of OGC encoding and interface standards in common agricultural activities.

• Renewal of MOU with IUSS WGSIS for coordination on SoilML / ISO 28258 and related standards.

• Coordination with the agricultural interest groups within ESIP and RDA.

• Coordination and exchange with other related initiatives such as GEOSS, GODAN, CGIAR, GlobalGAP, Open Ag Data Alliance, etc.

• Organization of Agricultural Geoinformatics Summits at OGC Technical Committee meetings.

Through previous projects DataBio partners have been engaged in the creation of ontologies

and data models like FOODIE6 and SENSLOG7. The FOODIE ontology extends INSPIRE data

model for Agriculture and Aquaculture Facilities themes. These ontologies and data models

6 http://foodie-cloud.github.io/model/FOODIE.html 77 https://sdi4apps.eu/2016/11/opensensorsnetwork-pilot-senslog-api-for-farmtelemetry-module/

Page 36: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 36

will be used in the DataBio project and related to the emerging standardisation interest in the

Agriculture area.

3.2.3 Forestry ontologies and standards

Forest information is standardised so that actors engaged in the forest sector could develop

and use harmonised information systems. There are several parallel, successive actors in the

forest sector value chain who have to exchange information when implementing measures.

Although the basic concepts and measurement units of forestry have already been quite

carefully defined for decades, almost every actor has implemented them differently in their

information systems until recent years. As a result, it has been difficult or almost impossible

to convert the information and transfer it from one system to another. Forest information

standards facilitate the use of open materials and data transfer between actors, which in turn

improves operational efficiency for the forest sector.

This website is maintained by the Finnish Forest Centre and Forestry Development Centre

Tapio8.

The forest information standards used by the information systems have been published as

xml schema documentation. The schema defines the structure and content of information so

that different information systems can exchange standardized information.

Available Forest Information Standards include a standard forestry data model, a standard for

special features data, a standard for forestry and micro stand forestry information, a standard

set of wood and forestry trades trading, a standard for wood and timber statistics as well as

Forest Centre messages for official use. The new official standard messages published

recently in 2018 include a message mix for wood harvesting and forest management, as well

as self-monitoring messages.

The standardization forum is currently working on a forest data update message and the first

official version of the message is to be released during 2018. Additionally, in autumn 2018,

the forest information standard compliance with the Y platform developed by the Population

Register Center will be explored, a redesign of a wide-ranging special feature code will be

planned and the interface between the digitized forest management recommendations and

the forest information standardization will be considered.

3.2.4 Fishery ontologies and standards

There are fewer established ontologies and standards in the Fishery domain, but in particular

FAO, the Food and Agriculture Organization of the United Nations has established a Fisheries

Glossary.

The FAO Fisheries Glossary has been jointly upgraded by the Fisheries and Aquaculture

Department and the Meeting Programming and Documentation Service. This upgrade stems

from the need to have it become an integral part of the FAO Term Portal. It includes additional

8 Forestry oriented standards - https://www.metsatietostandardit.fi/en/.

Page 37: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 37

features, languages, and access to alternative definitions for currently existing terms in the

FAO Term Portal. As at October 2014, the FAO Fisheries Glossary consists of approximately

1580 terms and definitions, grouped by subject areas, with relevant language equivalents

being developed when new terms are added

(http://www.fao.org/faoterm/collection/fisheries/en/).

In addition, there is a recent CEN CENELEC Workshop on Aquaculture, that might be relevant

also for some of the DataBio activities, https://www.cen.eu/News/Workshops/Pages/WS-

2016-14.aspx .

The UN/CEFACT FLUX (Fisheries Language for Universal eXchange) standards for information

exchange is designed to overcome the barrier with diverse national reporting standards.

Figure 6: The FLUX standards and status (from UN ESCAP presentation of Dr Heiner Lehr) [REF-37].

The type of data exchanged include:

• Information between stakeholders on stocks, quotas and catches

• Real time monitoring of vessel positions (VMS) and on-going fishing activities

• Reporting of fish landed and sales

• Vessel data and characteristics

• License and fishing authorisation requests

Page 38: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 38

3.3 Data access through standard services and APIs Besides taking advantage of existing standard services and interfaces in the Geospatial and

Earth Observation area, DataBio will also look into the usage and promotion of suitable APIs

for data access and other services.

3.3.1 Geospatial Standards, Data Types and Services

3.3.1.1 OGC View Services

View services make it possible to display, navigate, zoom to, or overlay spatial datasets and

to display legend information and any relevant content of metadata (EU Commission

DIRECTIVE 2007/2/EC, Art. 11.1 b).

A Web Map Services (WMS) provides geodata in the form of georeferenced image data in

raster or vector image formats, such as Portable Network Graphics (PNG), Graphics

Interchange Format (GIF) or Scalable Vector Graphics (SVG). In a configuration step of the

WMS, it is also possible to query attribute information stored in an image coordinate.

The Web Map Tile Service (WMTS) enables application to serve map tiles of spatially

references data using tile images with predefined content, extent and resolution. It can be

used to develop scalable, high performance services for web-based distribution of

cartographic maps.

3.3.1.2 OGC Download Services

Download services, enabling copies of spatial datasets, or parts of such sets, to be

downloaded and, where practicable, accessed directly (EU Commission DIRECTIVE 2007/2/EC,

Art. 11.1 c). A download service supports either the complete transfer of a geodataset or the

access to individual objects. The downloaded data is available to the user on his own IT system

and can be further processed if appropriate rights have been granted.

A Web Feature Service (WFS) provides a web-based access to vector-based objects or data.

New data models should be created exclusively on GML version 3.2. This service may be

limited to download predefined datasets without further individual query or selection

possibility of the contents (see http://www.opengeospatial.org/standards/sensorml).

The Web Coverage Service (WCS) provides georeferenced raster data, in particular of multi-

dimensional data stocks which represent phenomena with spatial or temporal variability. It

includes e.g. earth observation, height models or temperature distribution (see

http://www.opengeospatial.org/standards/sensorml ).

Page 39: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 39

3.3.1.3 Other Services

In addition to the already mentioned OGC services or interfaces, respectively, there are

service dealing with geospatial data which don’t implement these standards. In particular, if

it comes to semi- or even unstructured data, different approaches might become more

feasible.

Representational State Transfer (REST) does not describe a specific standard but rather an

architectural style for distributed hypermedia systems. REST does not suggest any specific

protocol or data format. Nevertheless, HTTP and JSON is widely used for such services.

Vector Data Formats

Geography Markup Language (GML) is a format focused on, but not exclusively, describing

vector data based on the Extensible Markup Language (XML). Since version 3 it is possible to

use extensions e.g. for coverages. GML does limit the description of geospatial objects as 2D-

and 3D-data only, but allows the inclusion of other information such as temporal data. This is

the preferred data format to be served by an OGC Web Service.

GeoJSON allows to describe and exchange geospatial information based on the JavaScript

Object Notation (JSON). While limited to 2D-Data, it provides support for a variety of different

geometry types such as Points, Lines and Polygons. Beside the geometric information an

object can hold additional properties to describe features. These objects are called feature

objects. Furthermore, GeoJSON allows to define so called FeatureCollections containing a set

of different features.

Well-known text (WKT) is a simple text-based markup language to describe geospatial

information. Originally described by the OGC, the current standard is specified by ISO/IEC

13249-3:2016 and ISO 19162:2015. Unlike GeoJSON, it is possible to describe not only 2D

features, but 3D features as well. This format is widely used to add geospatial information to

table-structured data such as SQL Databases or CSV files (comma-separated values).

3.3.2 Sensor Standards, ontologies, data representations

3.3.2.1 OGC Sensor Observation Service

The Sensor Observation Service (SOS) is an OGC standard and describes web services to store

and to query real-time sensor data and sensor data time series. SOS is part of the Sensor Web

Enablement. The offered sensor data comprises descriptions of sensors themselves, which

are encoded in the Sensor Model Language (SensorML, see below), and the measured values

in the Observations and Measurements (O&M) encoding format. The web service as well as

both file formats are open standards and specifications of the same name defined by the

Open Geospatial Consortium (OGC). If the SOS supports the transactional profile (SOS-T), new

sensors can be registered on the service interface and measuring values be inserted. A SOS

implementation can be used both for data from in-situ as well as remote sensing sensors.

Furthermore, the sensors can be either mobile or stationary.

Page 40: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 40

3.3.2.2 OGC Sensor Model Language

SensorML is an OGC standard and provides standard models and an XML encoding for

describing sensors and measurement processes. SensorML can be used to describe a wide

range of sensors, including both dynamic and stationary platforms and both in-situ and

remote sensors. It provides a provider-centric view of information in a sensor web, which is

complemented by Observations and Measurements (O&M) which provides a user-centric

view. Functions supported include:

• sensor discovery,

• sensor geolocation,

• processing of sensor observations,

• a sensor programming mechanism,

• subscription to sensor alerts.

Latest version of the standard is 2.0 published in the year 2012 (see

http://www.opengeospatial.org/standards/sensorml).

3.3.2.3 OGC SensorThings API

SensorThings API is an OGC standard providing an open and unified framework to

interconnect IoT sensing devices, data, and applications over the Web. It is an open standard

addressing the syntactic interoperability and semantic interoperability of the Internet of

Things. It complements the existing IoT networking protocols such CoAP, MQTT, HTTP,

6LowPAN. While the these protocols are addressing the ability for different IoT systems to

exchange information, OGC SensorThings API is addressing the ability for different IoT

systems to use and understand the exchanged information. As an OGC standard,

SensorThings API also allows easy integration into existing Spatial Data Infrastructures or

Geographic Information Systems.

Latest version of the standard is 1.0 published in the year 2015.

3.3.2.4 ISO 19156:2011 Geographic information - Observations and measurements

O&M standard defines a conceptual schema for observations, and for features involved in

sampling. The standard provides models for the exchange of information describing

observation acts and their results, both within and between different scientific and technical

communities. Observations commonly involve sampling of a feature-of-interest. The standard

defines a common set of sampling feature types classified primarily by topological dimension,

as well as samples for ex-situ observations. The schema includes relationships between

sampling features (sub-sampling, derived samples). The standard concerns only externally

visible interfaces and places no restriction on the underlying implementations other than

what is needed to satisfy the interface specifications in the actual situation.

The last version of the standard was published in the year 2011.

Page 41: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 41

3.3.2.5 W3C Semantic Sensor Network Ontology

This W3C ontology describes sensors and observations, and related concepts. It does not

describe domain concepts, time, locations, etc. these are intended to be included from other

ontologies via OWL imports. This ontology is developed by the W3C Semantic Sensor

Networks Incubator Group (SSN-XG). The ontology is based around concepts of systems,

processes, and observations. It supports the description of the physical and processing

structure of sensors. Sensors are not constrained to physical sensing devices: rather a sensor

is anything that can estimate or calculate the value of a phenomenon, so a device or

computational process or combination could play the role of a sensor. The representation of

a sensor in the ontology links together what it measures (the domain phenomena), the

physical sensor (the device) and its functions and processing (the models).

Last version of the SSN ontology was published in the year 2011.

3.3.2.6 NGSI-9/10

The FI-WARE version of the Open Mobile Alliance (OMA) NGSI-9 interface is a RESTful API via

HTTP. Its purpose is to exchange information about the availability of context information.

The three main interaction types are:

• one-time queries for discovering hosts (also called 'agents' here) where certain context information is available

• subscriptions for context availability information updates (and the corresponding notifications)

• registration of context information, i.e. announcements that certain context information is available (invoked by context providers).

The FI-WARE version of the OMA NGSI 10 interface is a RESTful API via HTTP. Its purpose is to

exchange context information. The three main interaction types are:

• one-time queries for context information

• subscriptions for context information updates (and the corresponding notifications)

• unsolicited updates (invoked by context providers).

3.3.2.7 IoT Architecture -Thing, Resource, Entity

IoT-Lite Ontology (http://iot.ee.surrey.ac.uk/fiware/ontologies/iot-lite). Surprisingly, there

are no standards with regards to events. As a result, each event processing tool has its own

programming model and semantics. The same goes for data representation of events.

3.3.3 API approach

The API approach is largely tested and relatively well used. There are many categories of APIs;

web-based system (e.g. REST), operating system (e.g., Cocoa), database system (e.g., Django)

and hardware system. APIs typically include three elements: access control, request

(operation and parameters) and response (data/service).

Page 42: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 42

Lately, businesses have changed their view on API from a technology to a business enabler.

Gartner9 introduces the concept of “API economy” together with “Digital business”. APIs

allows data to be provided and consumed across platforms, systems and services using

standards in a secure and reliable manner. However, there are some challenges with security

and efficiency related to the use of API as a data access mechanism.

A successful example of API-based data access is Transport for London: sharing 200 data

elements through an API10. The API is used by 600 different apps that 42% of London’s

population use11.

3.4 Stakeholders and concerns Using ArchiMate as a specification tool in the DataBio project, each dataset/datastream is

related explicitly to a set of pilot systems, stakeholders, components and/or pipelines. The

ArchiMate motivation and strategy diagrams specify the goals, drivers and outcomes of each

pilot system, indicating the relevance and use of the datasets/streams. Figure 7 shows a

strategy diagram from the B2 fishery pilots where the goals and outcomes are realized

through extensive data collection and processing.

Figure 7: ArchiMate strategy diagram showing how the pilot system will realize the defined goals

Furthermore, ArchiMate is used to model pilot applications that realize outcomes. Figure 8

shows how the “Provide decision support for pelagic fisheries planning” (shown in Figure 7)

9 https://www.gartner.com/smarterwithgartner/ 10 https://api.tfl.gov.uk/ 11 https://tfl.gov.uk/info-for/open-data-users/open-data-policy

Page 43: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 43

is supported by a set of process steps, including datasets, stakeholders and interactions. The

application diagram identifies EO Data, Vessel Operation Data, Meteorological Forcast and

Catch reports as required datasets/streams.

Figure 8: ArchiMate business diagram showing the data processing, datasets and actors involved

Each dataset can then be broken down into subsets (from ArchiMate Business Objects to

ArchiMate DataObject) as shown in Figure 9.

Page 44: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 44

Figure 9: ArchiMate data view for one of the fishery pilots (B2)

The pilots are realized both from datasets and DataBio components. Each pilot system utilizes

a set of components to implement the required big data processing steps: collection,

preparation, analysis, visualization and access. Figure 10 shows how the B2 fishery pilot is

designed.

Figure 10: The B2 fishery pilot lifecycle view showing how data is provided as input to processing steps

Page 45: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 45

To further specify the application architecture, a pipeline view is created for each pilot

system. The pipeline shows the component and dataset interfaces. Figure 11 shows the B2

fishery pilot pipeline with all its components and datasets.

Figure 11: The B2 fishery pilot pipeline view showing how datasets are interfaced

All pilots in DataBio are modelled in ArchiMate following this methodology. This allows for

traceability from stakeholder and goal to application realization:

• A stakeholder has a goal that will have an outcome

• An outcome is created from a set of actions

• An action requires a set of resources

• A resource can be a dataset or component (processing)

• Datasets and components are combined in an architecture through interfaces and responsibilities.

Using Softeam’s Modelio software, users can navigate through the DataBio ArchiMate models

for pilots and components to understand, compare and document the system/subsystems.

Page 46: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 46

3.5 License models for data reuse There exists a wide range of licencing schemes for publishing datasets. E.g, data.world lists

13 common schemes ranging from the most open to the most restrictive [REF-21]. These

licences are typically Creative Common (CC) licenses, which origins from the Open Source

domain.

In addition to the more or less open models, there are several models for commercial

licensing of closed datasets for b-b and b-g purposes, including International Data Spaces

(IDS), Unified eXchange Platform UXP) and Sharemind from Cybernetica.

Page 47: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 47

Requirements view 4.1 Types of EO data and sensors used in the DataBio pilots and their

characteristics Remote sensing is one of the most common ways to extract relevant information about the

Earth and our environment. Remote sensing acquisitions, done through both active (synthetic

aperture radar, LiDAR) and passive (optical and thermal range, multispectral and

hyperspectral) sensors, provide a variety of information about the land and ocean processes.

Different types of Earth Observation data have been developed over the last forty years

bringing significant changes in the context of the Big Data concept.

A typical Big Data application chain may require EO input data in addition to other sensor data

as depicted below.

Figure 12: EO Data Collection Context

A significant part of the 26 DataBio pilots use EO (Earth Observation) data as input for their

specific purposes, in the context of efficient resource use and increasing productivity in

agriculture, forestry and fishery. The general data types, including EO data, used in DataBio

pilot projects are listed in the table below.

Page 48: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 48

Table 2: Types of data used in DataBio pilot projects N

o. o

f p

ilot Category

Lea

der

Name of

pilot

Partn

ers

AOI Data used in the pilot

1

AG

RIC

ULT

UR

E

A. P

reci

sio

n H

ort

icu

ltu

re in

clu

din

g vi

ne

and

oliv

es A1. Precision

agriculture in

olives, fruits,

grapes and

vegetables N

P A1.1

Precision

agriculture

in olives,

fruits,

grapes

NP,

GAIA

Epiche

irein

Greece (Pilot Site A:

Chalkidiki - 600 ha,

Pilot Site B: Stimagka

- 3 000 ha, Pilot Site

C: Veria - 10 000 ha)

data directly from the field,

collected from a network

of telemetric IoT stations

called GAIAtrons; remotely

with image sensors on in-

orbit platforms; and by

monitoring the application

of inputs and outputs in

the farm (e.g. in-situ

measurements, farm logs,

farm profile)

2 A1.2

Precision

agriculture

in vegetable

seed crops

C.A.C.,

VITO

Eastern Italy.

Location: 5 farms,

Emilia Romagna

Region, for the total

acreage of 14,79

hectares in the first

year. To be expanded

to other crops in the

same Region and in

Region Marche.

satellite imagery, weather

and soil data and

yield/seed maturity

predictions

3 A1.3

Precision

agriculture

in

vegetables -

2 (Potatoes)

NB

Advies

, VITO

Veenkoloniën region

in the Netherlands

historical yield data - field

characteristics (sample

data yield data, potato

varieties, planting data

etc.), historical earth

observation data

4 A2. Big Data

management

in

greenhouse

eco-systems

A2.1 Big

Data

manageme

nt in

greenhouse

eco-

systems

CREA,

CERTH

greenhouse

horticulture in the

Thessali Region,

Greece

experimental data: whole

genome genotypic data,

metabolomics and

phenomic (lab) data;

observational data:

phenomics (field), sensor

data, environmental

indoor and outdoor

Page 49: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 49

5

B. A

rab

le P

reci

sio

n F

arm

ing B1. Cereals

and biomass

crops

Vit

o B1.1

Cereals and

biomass

crops 1

TRAGS

A

Cabreros del Río,

Castile - Leon, Spain,

“Ribera del Porma”

Farmers Community:

24.270 ha

high resolution (Sentinel-2

type) satellite images,

complemented with sensor

data and, in some specific

cases, with RPAS

(Remotely Piloted Aircraft

Systems) data and external

data

6 B1.2

Cereals and

biomass

crops 2

NP,

GAIA

Epiche

irein

Elassona, Greece-

2500 ha of maize as

targeted crops

data directly from the field,

collected from a network

of telemetric IoT stations

called GAIAtrons; remotely

with image sensors on in-

orbit platforms; and by

monitoring the application

of inputs and outputs in

the farm (e.g. in-situ

measurements, farm logs,

farm profile)

7 B1.3

Cereals,

biomass

crops 3

(Biomass

crops

monitoring

and

performanc

e

predictions)

CREA,

VITO,

NOVA

MONT

24 sites in Emilia

Romagna, Italy (120

ha) - CREA sorghum

pilot, 3 sites in Emilia

Romagna and

Veneto, Italy (6 ha) -

CREA fiber hemp

pilot, 4 sites in North

and South-Western

Sardinia, Italy (65 ha)

- NOVAMONT

cardoon pilot

satellite imagery,

telemetry IoT data (air

temperature, air moisture,

solar radiation, leaf

wetness, rainfall, wind

speed and direction, soil

moisture, soil temperature,

soil EC / salinity, PAR,

barometric pressure),

phenotypic data collected

for each cropping season

8 B1.4

Cereals,

biomass

crops 4

(Cereal crop

monitoring)

LESPR

O

8300 ha - Rostenice

(Vyskov, Czech

Republic); target

crops: cereals -

winter wheat, spring

barley, grain maize

EO data (Landsat 8 -

Landsat data repository -

(https://espa.cr.usgs.gov),

Sentinel 2A/B -

(https://scihub.copernicus.

eu/), Google Earth Engine

platform for fast viewing

EO data:

(https://earthengine.googl

e.com/), field boundaries

from Czech LPIS database

as shp or xml

(http://eagri.cz/public/app

/eagriapp/lpisdata/),

ortophotos, topography

maps, cadastral maps – as

Page 50: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 50

WMS service, farm data -

Crop rotation, crop

treatments records, yield

maps, soil maps

9 B2.

Machinery

management

and

environment

al issue

B2.1

Machinery

manageme

nt

LESPR

O,

ZETOR,

Federu

AOI in Czech Republic telemetry data from

machinery, other farm

data

10

C. S

ub

sid

ies

and

Insu

ran

ce C1.

Insurance e-

GEO

S C1.1

Insurance

NP 12000 ha in North

Greece - targeted

crops: 7 types

(wheat, stone fruits

etc.)

EO data, field data (soil

temperature, humidity -

multi-depth, ambient

temperature, humidity,

barometric pressure, solar

radiation, leaf wetness,

rainfall volume, wind

speed and direction),

historical and current

weather data, via the IoT

strations network,

enriched with yield data

information extracted from

the work calendar and

stored in the NP’s cloud

infrastructure

11 C1.2 Farm

Weather

Insurance

Assessment

e-

GEOS

AOI in Italy Copernicus satellite data

series, meteorological

data, other ground

available data

12 C2. CAP

Support

C2.1 CAP

Support

e-

GEOS,

TerraS,

Tragsa

AOI in Northern Italy

(50.000 ha) - 2

targeted crop types,

AOI in Southeastern

Romania (10.000

sqkm.) - 3 - 10 crop

types

data related to parcel

information and provided

by the users, satellite

optical and SAR data, in-

situ / field data

13 C2.2 CAP

Support -

Greece

NP,

GAIA

Epiche

irein

AOI in Northern

Greece (50.000 ha) -

2 targeted crops: dry

beans, peaches

data directly from the field,

collected from a network

of telemetric IoT stations

called GAIAtrons; remotely

with image sensors on in-

orbit platforms (EO data),

anonymized IASC data

Page 51: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 51

14

FOR

ESTR

Y

A. M

ult

iso

urc

e an

d d

ata

cro

wd

sou

rcin

g /

e-s

erv

ice

s A1. Easy data

sharing and

networking

MH

GS 2.2.1 Easy

data

sharing and

networking

MHGS,

VTT,

SPACE

BEL,

METSA

K, FMI

two estates, called

“Rangunkorven

yhteismetsä” and

“Taipale”, both

located in Central

Finland; follows up

the implementation

according to the

defined specifications

in Czech Republic and

Belgium by WP2

partners

forestry data transferred

via the Finnish forestry

standard XML format , real

time updates from field

measurements, forest

owners’ forest

management plans and

other notifications from

forest owners, forestry

operators and other

stakeholders; processed

data: forest estate,

geometry of

compartments, type of the

forest work, sample plot

locations, measured data

per sample plot,

measurement averages per

compartment,

measurement date and

user information; control

significant vegetation

changes, such as clear-cuts

and forest damage areas to

act in time

15 A2.

Monitoring

and control

tools for

forest

owners

2.2.2

Monitoring

and control

tools for

forest

owners

MHGS,

FMI,

TRAGS

A,

METSA

K

AOI in Finland forestry data, real time

updates from field

measurements, forest

owners’ forest

management plans and

other notifications from

forest owners, forestry

operators and other

stakeholders; processed

data: forest estate,

geometry of

compartments, type of the

forest work, sample plot

locations, measured data

per sample plot,

measurement averages per

compartment,

measurement date and

user information; control

significant vegetation

changes, such as clear-cuts

and forest damage areas to

act in time

Page 52: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 52

16

B. F

ore

st H

ealt

h /

Re

mo

te /

Cro

wd

sen

sin

g, In

vasi

ve s

pec

ies

/ d

amag

e B1. Forest

damage

remote

sensing

TRA

GSA

2.3.1 Forest

damage

remote

sensing

MHGS,

VTT,

SENOP

,

METSA

K,

SPACE

BEL

the main

demonstration areas

are the Hippala and

Rangunkorpi forest

plots in South-

Eastern Finland; in

Wallonia, FMI, with

the support of

Spacebel, aims to

develop a remote

sensing service to

provide a spatial

distribution of the

vulnerability and risk

exposure to diseases

and other potential

hazards based on

Sentinel-2 or

Sentinel-1+Sentinel-2

EO data (in particular

optical Sentinel-2 satellite

data), precise data from

airborne and field

measurements, used to

train and validate the

method

17 2.3.2-FH

Monitoring

of forest

health

TRAGS

A,

SENOP

,

CSEM,

CiaoT,

FMI,

VTT

large areas in the

Iberian Peninsula -

Spain (Extremadura,

Andalucia, Castilla y

León, Castilla La

Mancha, Madrid

remote sensing images

(satellite + aerial + UAV),

field dat

18 B2. Invasive

alien species

control –

plagues –

forest

management

2.3.2 IAS -

Invasive

alien

species

control and

monitoring

TRAGS

A,

SENOP

,

CSEM,

CiaoT,

FMI,

VTT

Spain - the Iberian

Peninsula, the Canary

Islands and the

Balearic Islands

EO data (Sentinel 2,

Landsat 8), several

alphanumeric Big Data

databases - centralized

data - WORLDCLIM dataset

(provided by the

International Journal of

Climatology - 19

bioclimatic raster layers

with a resolution of 1 km),

foreign trade database

from Spanish Finance

Ministry, Immigration

Database by Spanish

Statistical Institute,

tourism dataset from

Ministry of Energy,

Tourism and Digital

Agenda, GHS - population

grid (developed by JRC),

Spanish terrestrial

transport netword (ESRI

shp), provided by the

National Geographic

Page 53: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 53

Institute), NUTS-2, NUTS-3,

Municipalities maps from

GADM - Global

Administrative Areas

19

C. F

ore

st d

ata

man

agem

ent

serv

ice

s C1. Web-

mapping

service for

government

decision

making

MET

SAK

2.4.1 Web-

mapping

service for

government

decision

making

FMI,

VTT,

SPACE

BEL

Czech Republic,

Wallonia (Belgium)

Sentinel-2 satellite data,

distributed by European

Space Agency, forest

management maps, in-situ

LAI (Leaf-area index)

observations - in-situ data

from total 189 forest plots

with varying species

composition and structures

20 C2. Shared

multiuser

data

environment

2.4.2

Shared

multiuser

data

environmen

t

METSA

K, VTT

Finland centralized forest resource

data - original data source

for forest resource data

can be laser scanning, field

measurement, growth

modelling or notification

from forest owner or

forestry operator. Other

data sources for Kemera

financing data, forest use

declarations, access and

authorization.

21

FISH

ERY

A. F

ish

ing

vess

els

imm

edia

te o

per

atio

nal

ch

oic

es A1. Oceanic

tuna

fisheries

immediate

operational

choices

SIN

TEF

Fish

ery A1. Oceanic

tuna

fisheries

immediate

operational

choices

EHU-

UPV

South Atlantic, Indian

Ocean

EO data (Sentinel 3,

CMEMS products), data

from on board monitoring

systems / fleet sensor

observations (vessel

engines sensors - velocity

and heading, position of

the vessel, fish catches -

species, weight), weather

and sea condition

information.

22 A2. Small

pelagic

fisheries

immediate

operational

choices

A2. Small

pelagic

fisheries

immediate

operational

choices

SINTEF

Ocean

small pelagic fishing

fleet, covering the

North Atlantic Ocean

time series measurements

collected from a variety of

sources (power system,

navigation system,

weather sensors, deck

machinery), sonar /

hydroacoustic data; EO

data evaluated for

inclusion in the pilot

Page 54: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 54

23

B. F

ish

ing

vess

el t

rip

an

d f

ish

eri

es p

lan

nin

g B1. Oceanic

tuna

fisheries

planning

AZT

I B1. Oceanic

tuna

fisheries

planning

AZTI South Atlantic, Indian

Ocean

EO data, large datasets of

historical data (logbooks,

VMS, GPS, Buoys,

Observers), fuel

consumption data,

captures data, weather

forecast

24 B2. Small

pelagic

fisheries

planning

B2. Small

pelagic

fisheries

planning

SINTEF

Ocean

small pelagic fishing

fleet, typically

covering the North

Atlantic Ocean

extensive datasets within

fisheries activity and catch

statistics, combined with

information from that time

and history of the same

such as meteorological and

oceanographic data

(meteorological and

oceanographic hindcasts

and forecasts), moon

phase, time of day, time of

year, sonar data

25

C. F

ish

erie

s su

stai

nab

ility

an

d v

alu

e C1. Pelagic

fish stock

assessments

SIN

TEF

Fish

erie

s C1. Pelagic

fish stock

assessment

s

SINTEF

Fisheri

es

northeast Atlantic;

Norwegian coast

hydroacoustics,

oceanographic and

meteorological data (ocean

surface currents,

temperatures etc.),

collected in-situ or through

remote sensing, estimates

of fish species and

densities, catch reports,

oceanographic simulations,

stock simulations

26 C2. Small

pelagic

market

predictions

and

traceabilit

C2. Small

pelagic

market

predictions

and

traceability

SINTEF

Fisheri

es

the small pelagic

fisheries in the North

Atlantic Ocean

centralized data: market

trends by the World bank

and Norwegian Seafood

Council (market insight

data, statistics, trade

information, consumption

and consumer insight),

pelagic auction data (a

database containing

information about all

pelagic catches landed in

Norway in the last

decades), provided by

Norges Sildesalgslaget,

distributed/local data: fish

stock observations

(hydroacoustic and sonar

instruments), quality

Page 55: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 55

measurements, vessel

operations data (motion

and cost of operation)

As datasets, they can also be grouped as:

1. Existing datasets utilized by DataBio pilots: datasets that are available and have

relevance for the pilot systems in DataBio. The DataBio project demonstrates its

usefulness and provide recommendations for use.

2. Existing datasets that the DataBio project has improved in terms of easier or better

findability, accessibility, interoperability or reusability.

3. New datasets created by the DataBio project by combining or processing existing data

sources.

Subsequently, the types of EO data and sensors (classified into optical and SAR data) used in

the DataBio pilots are presented in terms of their main features: objectives of the mission,

spatial, temporal and radiometric resolution, coverage, data access etc., with special regard

on the aim of using these EO data in pilots, including derived EO products/results.

4.2 Datasets and datastream requirements from Platform This section describes the platform requirements that are related to EO datasets and

datastreams. Each requirement (EO-xxxxxx) has a textual description, zero to more

implementations in DataBio, and one or more relationships to requirements specified in the

pilots. Full details and navigation are provided in the ArchiMate models.

ID Requirement

EO-441020 The DataBio Platform shall discover EO metadata through interfaces compliant with the OGC 13-026r8 specification.

Implementations N/A

Page 56: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 56

Derived from

EO-441031 Discover available historical EO products

Implementations

Derived from

EO-441032 Discover extreme weather data

Implementations N/A

Derived from

EO-441040 The Proba-V data shall be discoverable using an Opensearch interface which can be integrated in FedEO.

Implementations N/A

Derived from

EO-442020 The interface to access the catalog where the Sentinel-2 data is stored (if stored remotely) shall be granted to the pilots.

Page 57: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 57

Implementations N/A

Derived from

EO-442030 The Proba-V data shall be accessible using the product URLs as returned in the OpenSearch responses (discovery step).

Implementations N/A

Derived from

EO-444010 The Proba-V MEP platform should provide a processing cluster allowing parallel computing and data analytics on Proba-V data and selected Sentinel-2 derived vegetation indices at country/region range.

Implementations N/A

Derived from

4.3 Datasets and datastream requirements from Agriculture pilots This section describes the Agriculture pilots’ requirements that are related to datasets and

datastreams. Each requirement (R1.x.y_z) has a textual description, zero to more

implementations in DataBio, and one or more relationships to requirements specified in the

pilots. Full details and navigation are provided in the ArchiMate models.

ID

Page 58: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 58

R1.2.1_6 A pilot needs the growth model

Implementations

Derived platform requirements

R1.2.1_7 A pilot needs EO data (historical and current).

Implementations

Derived platform requirements

R1.2.1_8 A pilot needs weather data (historical and current)

Implementations

Derived platform requirements

R1.3.1_4 A pilot need that the current solution has to be improved, developed and scaled from 34KHa to several municipalities and NUTS-2 level

Page 59: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 59

Implementations

Derived platform requirements

R1.3.1_6 A pilot needs availability of historical and actual EO data (including vegetation indices(e.g., NDVI, EVI, NDRE, NDMI)

Implementations

Derived platform requirements

R1.3.1_7 A pilot needs analysis on EO, DEM, soil and crop data by applying machine learning algorithms to identify management zones within the fields and its export in vector format (shp, isoxml)

Implementations

Derived platform requirements

R1.3.1_8 A pilot needs analysis of spatial variability of crop status and alerting service

Implementations

Page 60: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 60

Derived platform requirements

R.1.3.1_9 A pilot needs reporting - by field or aggregated for crop type

Implementations

Derived platform requirements

R.1.3.1_11 A pilot needs: (1) Components enabling to harness satellite data for applications in farm telemetry, with particular interest in Crop Monitoring and Predictions. (2) Components for crop monitoring and real-time analytics using real-time streaming data from wireless sensor networks; capability to trigger alarm/notifications/recommendations in order to improve farm operations and productivity

Implementations

Derived platform requirements

R.1.4.1_3 A pilot needs availability of current and historical EO data (including for example vegetation indices such as NDVI,LAI)

Page 61: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 61

Implementations

Derived platform requirements

R.1.4.1_4 A pilot needs availability of weather data (integrated together with weather stations data). Parameters will be temperature, rainfall and humidity.

Implementations

Derived platform requirements

R.1.4.1_5 A pilot needs analysis on historical EO and weather data by applying machine learning algorithms to assess the impact of the bad weather conditions

Implementations

Derived platform requirements

Page 62: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 62

4.4 Datasets and datastream requirements from Forestry pilots This section describes the Forestry pilot requirements that are related to datasets and

datastreams. Each requirement (R2.x.y_z) has a textual description, zero to more

implementations in DataBio, and one or more relationships to requirements specified in the

pilots. Full details and navigation are provided in the ArchiMate models.

ID

R2.2.2_1 A pilot needs damage & quality reporting features to the Wuudis mobile app (MHG), Needs standard development (METSAK), Integrations

Implementations

Derived platform requirements

R2.3.1_1 A pilot needs new satellite and RS map layers provided via WMS/WMTS interface, Customizable map layers development to the Wuudis (MHG), Real-time forest management service development based on multiple forest big data sources

Implementations

Page 63: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 63

Derived platform requirements

R2.3.2_1 A pilot needs learning about methodologies to assess and monitor forest health status

Implementations

Derived platform requirements

R2.4.1_1 A pilot needs shared repository of Sentinel-1 and Sentinel-2 satellite images.

Implementations

Derived platform requirements

R.2.4.1_2 A pilot needs cloud environment with components for satellite data pre-processing (components FMI 1-4)

Implementations

Page 64: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 64

Derived platform requirements

R2.4.2_2 A pilot needs XML standard development (METSAK) for the forest damages and forest stand information update, Integrations and X-road approach for data transfer services as well as development of the data visualization/ map service for the forest damage information

Implementations

Derived platform requirements

4.5 Datasets and datastream requirements from Fishery Pilots This section describes the Fishery pilots’ requirements that are related to datasets and

datastreams. Each requirement (R3.x.y_z) has a textual description, zero to more

implementations in DataBio, and one or more relationships to requirements specified in the

pilots. Full details and navigation are provided in the ArchiMate models.

ID Requirement Implementations

R3.3.1_1 A pilot needs satellite data streams of sea surface temperature, sea surface salinity, sea level anomalies, ice concentrations, chlorophyll-a concentrations.

Implementations

Page 65: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 65

Derived platform requirements

R3.3.1_2 A pilot needs ocean current simulation data streams.

Implementations

Derived platform requirements

R3.3.1_3 A pilot needs buoys data and position of the vessel

Implementations

Derived platform requirements

Page 66: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 66

R3.3.2_1 A pilot needs meteorological data to be available in the vessel power system

Implementations

Derived platform requirements

R3.3.2_2 A pilot needs meteorological data to be collected by interfacing with existing sensors, or new sensors provided

Implementations

Derived platform requirements

R3.3.2_3 A pilot needs meteorological data to be collected by interfacing with existing sensors, or new sensors provided

Implementations

Derived platform requirements

R3.3.2_4 A pilot needs satellite data streams of sea surface temperature, sea

Page 67: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 67

surface salinity, sea level anomalies, ice concentrations, chlorophyll-a concentrations.

Implementations

Derived platform requirements

R3.4.1_1 A pilot needs missing data sources including fishery-dependent data, fishery-independent data, oceanography.

Implementations

Derived platform requirements

R3.4.1_2 A pilot needs fishery-dependent data: landed catch (Sildes, ICES), scientific surveys (IMR), ERS (Norwegian directorate of fisheries)

Implementations

Derived platform requirements

R3.4.1_3 A pilot needs fishery-independent data: Publically available scientific survey data, hydro acoustics from fishing vessels (perhaps through ratatosk C17.01)

Page 68: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 68

Implementations

Derived platform requirements

R3.4.1_4 A pilot needs oceanographic data: Satellite streams of sea surface temperature, sea surface salinity, sea level anomalies, ice concentrations, chlorophyll-a concentrations. (ICES, met.no, SPACEBEL, ..)

Implementations

Derived platform requirements

R3.4.2_2 A pilot needs machine learning & data analysis components for finding covariations (multivariate/PCA analysis) and estimating price prediction models.

Implementations

Derived platform requirements

Page 69: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 69

Datasets: existing, improved, new and others This section presents the datasets identified by the DataBio projects as relevant for the

selected domains, agriculture, forestry and fishery. The datasets are grouped into three

sections based on their availability:

1. Existing datasets utilized by DataBio pilots: datasets that are available and have

relevance for the pilot systems in DataBio. The DataBio project demonstrates its

usefulness and provide recommendations for use

2. Existing datasets that the DataBio project has improved in terms of easier or better

findability, accessibility, interoperability or reusability.

3. New datasets created during the DataBio project by collecting new data or combining

or processing existing data sources.

4. Other datasets that might be of (future) relevance to DataBio pilots or similar systems.

Please note that many datasets are missing some parameters in the description. The datasets

are continuously being added to the DataBioHub and most of the parameters will be included

as they are harvested automatically from the data source.

The datasets are presented with the available metadata. The full metadata template structure

is provided in Appendix A.

5.1 Existing datasets utilized by DataBio Pilots

5.1.1 Open Transport Map (UWB - D03.02)

Field Value

Internal Name of

the Dataset

D03.02

Name of the

Dataset/API

Provider

Open Transport Map

Short Description The Open Transport Map displays a road network which

– is suitable for routing –

– visualizes average daily Traffic Volumes for the whole EU –

– visualizes time related Traffic Volumes (in OTN Pilot Cities - Antwerp,

Birmingham, Issy-le-Moulineaux, Liberec region) –

Talking technical, the Open Transport Map

– can serve as a map itself as well as a layer embedded in your map –

Page 70: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 70

– is derived from the most popular open dataset - OpenStreetMap –

– is accessible via both GUI and API –

– covers the whole European Union –

Version 1.0

Initial Availability

Date

07.03.2017

Data Type geographic data

Personal Data no

Rightsholder Plan4all

Other Rights

Information

Open Data Commons Open Database License (ODbL)

Dataset/API

Owner/Responsibl

e

UWB

Dataset/API

Owner/Responsibl

e Contacts

[email protected]

Technology

Name of the

System

Open Transport Map

Page 71: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 71

Dataset Data

Model/API

Interface

GUI, WMS, WFS, shapefile, all described at http://opentransportmap.info

Data Model:

Standards,

Glossaries and

metadata

standards

WMS, WFS, shapefile, PostGIS

Data Identifier -

Standard used

Data Model -

Specific Data

Model

http://opentransportmap.info/img/OTM_physicalModelAndCodelists.s

vg

Data Volume 20 Gb

Update Frequency irregularly

Data Archiving and

preservation

Geographical

Coverage

European Union

Timespan 2015-present

5.1.2 Forest resource data (METSAK - D18.01)

Field Value

Page 72: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 72

Internal Name of the Dataset D18.01

Name of the Dataset/API

Provider

Forest resource data / MESTAK

Short Description The pilot uses METSAK’s forest resource data concerning

privately owned Finnish forests from METSAK’s forest

resource data system. The forest resource data consists of

basic data of tree stands (development class, dominant tree

species, scanned height, scanned intensity, stand

measurement date), strata of tree stands (mean age, basal

area, number of stems, mean diameter, mean height, total

volume, volume of logwood, volume of pulpwood), growth

place data (classification, fertility class, soil type, drainage

state, ditching year, accessibility, growth place data source,

growth place data measurement date), geometry and

compartment numbering. The forest resource data is

available in a standard format for external use with consent

of a forest owner.

Extended Description The forest resources are invented once in a decade per certain

area using remote sensing (airborne laser scanning) and aerial

photographs. The new data is analysed and in some parts

measured in the field. Other updates on the forest resource

data are yearly growth calculations, possible notifications of

forest use or other forestry operations or so called Kemera

financing operations and possible new aerial photographs to

be interpreted.

Version Oracle database and data model version 2.5.2.

Initial Availability Date from year 2010 onwards

Page 73: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 73

Data Type Oracle database model

Personal Data User information

Rightsholder METSAK

Dataset/API

Owner/Responsible

Forest resource data / METSAK / Aapo Lindberg

Dataset/API

Owner/Responsible Contacts

Oracle database model/ [email protected]

Technology Oracle database

Name of the System Aarni

Data Model: Standards,

Glossaries and metadata

standards

Oracle database model for forest resource data

Data Identifier - Standard

used

N/A

Data Model - Specific Data

Model

Oracle database model

Data Volume 1984 GB

Update Frequency Online

Data Archiving and

preservation

Real time backup procedures as well as database copy once a month

Page 74: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 74

Geographical Coverage Finland

Timespan Data available from year 2010 onwards

Access Level METSAK users

Access Mechanism Active directory user management

5.1.3 Landsat 8 OLI data

Field Value

Internal Name of the

Dataset

Landsat 8 OLI

Name of the Dataset/API

Provider

NASA and the U.S. Geological Survey

Extended Description Landsat 8 (formerly the Landsat Data Continuity Mission,

LDCM), a collaboration between NASA and the U.S. Geological

Survey, provides moderate-resolution measurements of the

Earth’s terrestrial and polar regions in the visible, near-

infrared, short wave infrared, and thermal infrared. Landsat 8

provides continuity with the more than 40-year long Landsat

land imaging dataset. Landsat 8 carries two push-broom

instruments: The Operational Land Imager (OLI) and the

Thermal Infrared Sensor (TIRS).

The spectral bands of the OLI sensor provides enhancement

from prior Landsat instruments, with the addition of two

additional spectral bands: a deep blue visible channel (band 1)

specifically designed for water resources and coastal zone

investigation, and a shortwave infrared channel (band 9) for

the detection of cirrus clouds.

The TIRS instrument collects two spectral bands for the

wavelength covered by a single band on the previous TM and

ETM+ sensors.

Page 75: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 75

Landsat 8 mission’s objectives are:

· to provide data continuity with Landsats 4, 5, and

7;

· to offer 16-day repetitive Earth coverage, an 8-

day repeat with a Landsat 7 offset;

· · to build and periodically refresh a global archive

of sun-lit, substantially cloud-free, land images.

Data Type Level 0 (L0) Data Products

Description of the products: L0 data products are image data

with all data transmission and formatting artefacts removed.

These products are time provided, spatial, and band-

sequentially ordered multispectral digital image data.

Level 1 Radiometric (L1R) Data Products

Description of the products: L1R data products consist of

radiometrically corrected image data derived from L0 data

scaled to at-aperture spectral radiance or reflectance. Level 1

Systematic (L1G) Data Products

Description of the products: L1G data products consist of L1R

data products with systematic geometric corrections applied

and resampled for registration to a cartographic projection,

referenced to the World Geodetic System 1984 (WGS84).

Level 1 Gt (L1Gt) Data Products

Description of the products: L1Gt data products consist of L1R

data products with systematic geometric and terrain

corrections applied and resampled for registration to a

cartographic projection, referenced to the WGS84.

Level 1 Terrain (L1T) Data Products

Description of the products: L1T data products consist of L1R

data products with systematic geometric corrections applied,

using Ground Control Points (GCPs) or onboard positional

information to resample the image data for registration to a

cartographic projection, referenced to the WGS84. The data

are also terrain corrected for relief displacement.

Level-2 Data Products

Description of the products: Surface Reflectance are available

on demand, courtesy of the USGS (U.S. Geological Survey).

Page 76: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 76

They provide an estimate of the surface spectral reflectance as

it would be measured at ground level in the absence of

atmospheric scattering or absorption.

Landsat 8 Tier 1 data

Description of the products: They are the Landsat scenes with

the highest available data quality and are considered suitable

for time-series analysis. Tier 1 includes Level-1 Precision and

Terrain (L1TP) corrected data that have well-characterized

radiometry and are inter-calibrated across the different

Landsat instruments.

Landsat 8 Tier 2 data

Description of the products: Landsat 8 Tier 2 products are the

ones that do not meet the Tier 1 criteria during processing. Tier

2 includes Systematic Terrain (L1GT) and Systematic (L1GS)

processed data, as well as any L1TP data that do not meet the

Tier 1 specifications due to significant cloud cover, insufficient

ground control, and other factors.

Landsat 8 Real-Time data

Description of the products: The Real-Time Tier contains data

immediately after acquisitions that use estimated parameters.

Real-Time data are reprocessed and assessed for inclusion into

Tier 1 or Tier 2 as soon as final parameters are available.

Access Mechanism Landsat Level-1 data products are available for immediate

download.

There are several ways of accessing Landsat-8 Level 1

products:· EarthExplorer (https://earthexplorer.usgs.gov/) –

provides a graphical user interface to define areas of interest

(AOI) by place name, address, zip code or creating an AOI on

the interactive map. Queries can be applied to multiple

collections simultaneously. The Bulk Download Application is

an easy-to-use tool for downloading large quantities of

satellite imagery and geospatial data on Earth Explorer. Once

scenes are added to a Bulk Order via Earth Explorer, the Bulk

Download Application can be used to automatically retrieve

them with little to no user interaction and the application will

automatically iterate through the scene list and download

each until all have been processed.

Page 77: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 77

· GloVis (https://glovis.usgs.gov/ );

· LandsatLook Viewer (https://landsatlook.usgs.gov/ ).

· Surface Reflectance and other Level-2 science

products are available on request through:

· USGS Earth Resources Observation and Science

(EROS) Center Science Processing Architecture (ESPA)

On Demand Interface;

· ESPA Application Programming Interface (API);

· EarthExplorer – allows ordering of only surface

reflectance (SR) data products.

5.1.4 Sentinel 3 OLCI (Ocean and Land Colour Instrument) data

Field Value

Internal Name of the

Dataset

Sentinel 3 OLCI

Name of the Dataset/API

Provider

ESA

Short Description The Sentinel-3 mission carries multiple instruments to

measure sea-surface topography, sea and land-surface

temperature, ocean- and land-surface colour, contributing to

the Copernicus marine, land, atmosphere, emergency, security

and cryosphere applications. It is based on a constellation of

two identical satellites, Sentinel-3A and Sentinel-3B, launched

separately.

Extended Description Primary geophysical products provided by the Sentinel-3

mission are:

· global coverage Sea Surface Height (SSH) for ocean

and coastal areas;

· enhanced resolution SSH products in coastal zones

and sea-ice regions;

· global coverage Sea Surface Temperature (SST) and

sea-Ice Surface Temperature (IST);

· global coverage ocean colour and water quality

products;

Page 78: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 78

· global coverage ocean surface wind speed

measurements;

· global coverage significant wave height

measurement;

· global coverage atmospheric aerosol consistent over

land and ocean;

· global coverage total column water vapour over land

and ocean;

· global coverage vegetation products;

· global coverage land ice/snow surface temperature

products;

· ice products (e.g., ice surface topography, extent,

concentration).

Secondary geophysical products provided by the Sentinel-3

mission are:

· global coverage fire monitoring products (e.g. fire

radiated power, burned area, risk maps);

· · inland water (lakes and rivers) surface height data.

Geographical Coverage One Sentinel-3 satellite provides a revisit time of 27 days (385

orbits). OLCI’s field of view and its swath width of 1270 km,

allows global coverage at the equator to be provided in 2–4

days with one satellite and in less than two days with two

satellites.

Access Mechanism On 9 March 2018, Level-1 and Level-2 Sentinel-3 OLCI PDUs,

full and reduced resolution, began to be released through the

Sentinel-3 Pre-Operational Data Hub.

5.1.5 Sentinel 3 SLSTR (Sea and Land Surface Temperature Radiometer)

Field Value

Internal Name of the

Dataset

Sentinel 3 SLSTR

Page 79: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 79

Name of the Dataset/API

Provider

ESA

Short Description The Sentinel-3 mission carries multiple instruments to

measure sea-surface topography, sea and land-surface

temperature, ocean- and land-surface colour, contributing to

the Copernicus marine, land, atmosphere, emergency, security

and cryosphere applications.

Extended Description The sensors / main instruments of the Sentinel-3 mission are:

· Ocean and Land Colour Instrument (OLCI);

· Sea and Land Surface Temperature Radiometer (SLSTR);

· SAR Radar Altimeter (SRAL);

· MicroWave Radiometer (MWR);

· Precise Orbit Determination (POD), which consists of 3

instruments: DORIS: a Doppler Orbit Radio positioning system;

GNSS: a GPS receiver, providing precise orbit determination

and tracking multiple satellites simultaneously; LRR: to

accurately locate the satellite in orbit using a Laser Retro-

Reflector system.

The Sea and Land Surface Temperature Radiometer (SLSTR) is

a dual scan temperature radiometer, which has been selected

for the low Earth orbit (800 - 830 km altitude) ESA Sentinel-3

operational mission as a part of the Copernicus (Global

Monitoring for Environment and Security) programme. SLSTR

is the successor of the (A)ATSR series (aboard the ERS and

ENVISAT missions).

The main objective of SLSTR products is to provide global and

regional Sea and Land Surface Temperature (SST, LST) to a very

high level of accuracy (better than 0.3 K for SST) for both

climatological and meteorological applications.

SLSTR is mostly known for its marine applications (SST – Sea

Surface Temperature), but it also provides information related

to biomass burning (fire detection and classification). SLSTR

also contributes to climate studies by bringing several of the

required Essential Climate Variables (ECVs) to the scientific

community.

Page 80: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 80

Geographical Coverage The mean global coverage revisit time for dual view SLSTR

observations is 1.9 days at the equator (one operational

spacecraft) or 0.9 days (in constellation with a 180° in-plane

separation between the two spacecraft) with these values

increasing at higher latitudes due to orbital convergence.

Timespan

Access Level Sentinel-3 SLSTR products are made available systematically

and free of charge to all data users including the general public,

scientific and commercial users.

Access Mechanism Sentinel-3A SLSTR data products are available via the

Copernicus Open Access Hub.

5.1.6 MODIS data

Field Value

Internal Name of the

Dataset

MODIS

Name of the Dataset/API

Provider

NASA

Short Description The Moderate-resolution Imaging Spectroradiometer (MODIS)

is a scientific instrument (radiometer) on board the NASA Terra

and Aqua satellite platforms, launched in 1999 and 2002

respectively to study global dynamics of the Earth atmosphere,

land, ice and oceans.

Extended Description MODIS captures data in 36 spectral bands ranging in

wavelength from 0.4 um to 14.4 um and at varying spatial

resolutions (2 bands at 250 m, 5 bands at 500 m and 29 bands

at 1 km), providing complete global coverage of the Earth

every 1 to 2 days. Both Terra and Aqua platforms are in sun

synchronous, near polar (98 degree) orbits at 705 km altitude

but with a descending local equatorial crossing time of

Page 81: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 81

10:30am in the case of Terra and a 1:30pm ascending crossing

time for Aqua.

MODIS Terra Global Level 3 Mapped Thermal SST products

consists of sea surface temperature (SST) data derived from

the 11 and 12 um thermal IR infrared (IR) bands (MODIS

channels 31 and 32). Daily, weekly (8 day), monthly and annual

MODIS SST products are available at both 4.63 and 9.26 km

spatial resolution and for both daytime and night-time passes

Rightsholder MODIS products are available courtesy of GSFC – NASA.

Geographical Coverage The orbit of the Terra satellite goes from north to south across

the equator in the morning and Aqua passes south to north

over the equator in the afternoon resulting in global coverage

every 1 to 2 days

5.1.7 Proba-V data

Field Value

Internal Name of the

Dataset

Proba-V

Name of the Dataset/API

Provider

Vito

Short Description The Proba-V mission provides multispectral images to study

the evolution of the vegetation cover on a daily and global

basis. The 'V' stands for Vegetation. This mission is extending

the dataset of the long-established Vegetation instrument,

flown as a secondary payload aboard France's SPOT-4 and

SPOT-5 satellites launched in 1998 and 2002 respectively. The

Proba-V mission has been developed in the frame of the ESA

General Support Technology Program (GSTP). The

Contributors to the Proba-V mission are Belgium, Luxembourg

and Canada.

Page 82: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 82

Extended Description Proba-V’s main applications are related to monitoring plant

and forest growth, as well as inland water bodies. The

Vegetation instrument can distinguish between different land

cover types and plant species, including crops, to reveal their

health, as well as detect water bodies and vegetation burn

scars.

The VEGETATION instrument is pre-programmed with an

indefinite repeated sequence of acquisitions. This nominal

acquisition scenario allows a continuous series of identical

products to be generated, aiming to map land cover and

vegetation growth across the entire planet every two days.

Geographical Coverage The mission, developed as part of ESA's Proba Programme, is

an ESA EO mission providing global coverage every two days,

with latitudes 35-75°N and 35-56°S covered daily, and

between 35°N and 35°S every 2 days

Timespan

Access Level

Access Mechanism PROBA-V products can be ordered and downloaded from the

PROBA-V Product Distribution Portal (PDP) at

http://www.vito-eodata.be/. Products are usually available

within 24 hours after sensing time (max 48 hours). Figure 8

shows the portal’s main page.

URI https://www.vito-

eodata.be/PDF/portal/Application.html#Home

5.1.8 Global Precipitation Measurement (GPM) mission data

Field Value

Internal Name of the

Dataset

Page 83: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 83

Name of the Dataset/API

Provider

Short Description Global Precipitation Measurement (GPM) is an international

satellite mission to provide next-generation observations of

rain and snow worldwide every three hours.

Extended Description NASA and the Japanese Aerospace Exploration Agency (JAXA) launched the GPM Core Observatory satellite on February 27th, 2014, carrying advanced instruments that set a new standard for precipitation measurements from space.

The foundation of the GPM mission is the Core Observatory satellite provided by NASA and JAXA. Data collected from the Core satellite serves as a reference standard that unifies precipitation measurements from research and operational satellites launched by a consortium of GPM partners in the United States, Japan, France, India, and Europe.

The Core satellite measures rain and snow using two science instruments: the GPM Microwave Imager (GMI) and the Dual-frequency Precipitation Radar (DPR). The GMI captures precipitation intensities and horizontal patterns, while the DPR provides insights into the three dimensional structure of precipitating particles. Together these two instruments provide a database of measurements against which other partner satellites’ microwave observations can be meaningfully compared and combined to make a global precipitation dataset.

Rightsholder NASA

Update Frequency The GPM constellation of satellites can observe precipitation

over the entire globe every 2-3 hours

Geographical Coverage The GPM constellation of satellites can observe precipitation

over the entire globe every 2-3 hours

Access Mechanism https://pmm.nasa.gov/data-access/downloads/gpm

URI https://pmm.nasa.gov/GPM

Page 84: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 84

5.1.9 KNMI (Koninklijk Nederlands Meteorologisch Instituut) precipitation data

Field Value

Internal Name of the

Dataset

KNMI

Name of the Dataset/API

Provider

KNMI Data Centre (KDC)

Short Description The KNMI Data Centre (KDC) provides access to weather,

climate and seismological datasets of KNMI (Koninklijk

Nederlands Meteorologisch Instituut).

Extended Description The primary tasks of KNMI are weather forecasting,

monitoring of climate changes and monitoring seismic activity.

KNMI is also the national research and information centre for

climate, climate change and seismology.

Rightsholder KDC

Geographical Coverage KNMI Products cover the Netherlands and surrounding areas.

Access Mechanism Access to most is unrestricted and provided under the

'OpenData' policy of the Dutch government. For what

concerns the specific precipitation KNMI dataset described in

this document, the access is free, but a registration is needed.

The Multisensor Evolution Analysis (MEA) technology (C41.01

Databio component) provides access to the above mentioned

KNMI precipitation data.

Page 85: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 85

5.1.10 CMEMS (Copernicus Marine Environment Monitoring Service) data

Field Value

Internal Name of the

Dataset

CMEMS

Name of the Dataset/API

Provider

Copernicus

Short Description The CMEMS (Copernicus Marine Environment Monitoring

Service) provides regular and systematic core reference

information on the state of the physical oceans and regional

seas. The observations and forecasts produced by the service

support all marine applications.

Extended Description From May 2015, Copernicus Marine Environment Monitoring

Service (CMEMS) is working on an operational mode. It follows

the MyOcean demonstration phase that enabled to open the

service on a pre-operational mode during 6 years.

The service is meant for any user requesting generic

information on the ocean, and especially downstream service

providers who use this information as an input to their own

value-added services to end-users. The CMEMS can be defined

as:

• An integrated Service;

• An Open and Free service;

• Providing access to a single Catalogue of products;

• A reliable service;

• A sustainable service.

Data Type Copernicus Marine products are delivered in netCDF format

(.nc). They can easily be downloaded through the CMEMS

interface. Data are directly available through services like CSW

catalog (Catalog Services for Web), WMS (Web Map Service),

Subsetter, Direct Get File, FTP.

Page 86: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 86

Access Mechanism In order to provide a standard access to the CMEMS Products,

the FedEO Gateway (C07.01) has been extended with an

additional connector for the CMEMS web service interface. By

this way, CMES products can be retrieved and downloaded

through the same components FedEO Gateway (C07.01) and

the Data Manager (C07.04) via the same standard OpenSearch

Interface compliant with OGC 13-026r8 than other EO

products such as Sentinel products

URI

5.1.11 Sentinel 2A (ESA D11.01)

Field Value

Internal Name of the Dataset D11.01

Name of the Dataset/API

Provider

Sentinel 2A

Short Description Sentinel 2B data provided by ESA. Multiples geographical

areas and various times

Extended Description https://sentinel.esa.int/web/sentinel/sentinel-data-access

https://scihub.copernicus.eu/twiki/do/view/SciHubWebPortal/API

HubDescription

Data Type EO data

Page 87: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 87

Rightsholder ESA. License CC-BY.

Dataset/API Owner/Responsible ESA

Dataset/API Owner/Responsible

Contacts

[email protected]

Name of the System Sentinel

Dataset Data Model/API

Interface

REST

Data Model: Standards,

Glossaries and metadata

standards

SENTINEL SAFE

Data Volume ~GB

Update Frequency Every 5 days

Data Archiving and preservation Locally on TRAGSATEC Premises

Geographical Coverage Extremadura, Galicia

Timespan 2016 - End of the Project

Page 88: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 88

5.1.12 Sentinel-2 Data

Field Value

Internal Name of the Dataset D14.01, D14.02

Name of the Dataset/API

Provider

Sentinel-2 Data

Short Description · Sentinel-2 L1 data (C14.01). Sentinel-2 L1 data

archive. ESA. Czech Republic

· Sentinel-1 IWS data (C14.02). Sentinel-1 L1 data

archive. EO data. Czech Republic

· Sentinel-2 HR Optical data (C14.03) Sentinel-2

archive. European Space Agency (ESA). Global

coverage

Extended Description NP has the data for its pilot areas (Τ1.2.1, Τ1.4.1, Τ1.4.2)

corresponding to 6 tiles. Thematic Exploitation Platforms,

such as the Forestry TEP (C16.10), are available for online

analytics.

Rightsholder ESA

Data Model: Standards,

Glossaries and metadata

standards

SENTINEL-SAFE format

Data Volume L1 data: Approximately 6Gb per scene

Update Frequency L1: 10 days revisit time, up to 5 days in Q2 2017

Page 89: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 89

Geographical Coverage · Sentinel-2 L1 data (C14.01). Sentinel-2 L1 data

archive. ESA. Czech Republic

· Sentinel-1 IWS data (C14.02). Sentinel-1 L1 data

archive. EO data. Czech Republic

· Sentinel-2 HR Optical data (C14.03) Sentinel-2

archive. European Space Agency (ESA). Global

coverage.

Timespan L1: June 2015 - now

5.1.13 Sentinel 3 SRAL (Synthetic Aperture Radar Altimeter) data

5.1.14 Sentinel 3 MWR (Microwave Radiometer) data

5.2 Datasets improved by DataBio This section presents datasets that are improved by DataBio through processing or other data

management mechanisms.

5.2.1 RPAS (Remotely Piloted Aircraft Systems) data

Field Value

Internal Name of the

Dataset

RPAS

Name of the Dataset/API

Provider

TRAGSA

Short Description RPAS data, property of TRAGSA, are provided according to the

pilot needs. The images acquired are provided in 6 spectral

bands: RGB, Red Edge, NIR, Thermal, as well as point-cloud

Extended Description The delivery of RPAS imagery started in October 2017 and the

areas covered represent small parcels (hectares) within pilot

areas in the areas in the Iberian Peninsula - Spain

(Extremadura, Andalucia, Castilla y León, Castilla La Mancha,

Madrid). RPAS imagery are stored in TRAGSA Premises.

Page 90: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 90

Timespan From 2017

5.2.2 Ortophotos

Field Value

Internal Name of the

Dataset

Ortophotos

Name of the Dataset/API

Provider

TRAGSA

Short Description The National Geographic Information Centre of Spain provides

a mosaic of the latest orthophotos of the National Plan for

Aerial Orthophotography.

Extended Description They are delivered in ETRS89 - The European Terrestrial

Reference System 1989 datum for the Iberian Peninsula,

Balearic Islands, Ceuta and Melilla, and WGS84 for the Canary

Islands and UTM projection in the corresponding zone.

Each unit (mosaic) covers a MTN50 sheet (National

Topographic Map at 1:50 000 scale).

All datasets are processed by TRAGSA to produce improved

images. Specifically, orthophotos will be transformed by an

orthorectification method developed under WP5. Component

C11.03 – Radiometric Corrections is a tool that provides colour

correction and homogenization process of orthophotos from

different areas and/or dates. This tool increase orthophotos

homogeneity and improve their subsequent possibilities of

use, both for agrarian and environmental purposes, using

image analysis automatized processes.

Page 91: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 91

Access Mechanism Ortophotos are provided by Spanish National Geographic

Institute at

http://centrodedescargas.cnig.es/CentroDescargas/catalogo.

do

5.2.3 gaiasense field (D13.01)

Field Value

Internal Name of the Dataset D13.01

Name of the Dataset/API

Provider

gaiasense field

Short Description Dataset composed of measurements from NP’s telemetric IoT

agro-climate stations called GAIATrons.

Extended Description Dataset composed of field-sensing measurements from NP’s

network of telemetric IoT stations, called GAIAtrons. GAIAtrons

offer configurable data collection and transmission rates and

come in two variants. The GAIATron Atmo stations measures

atmospheric parameters (e.g. ambient temperature, humidity,

wind speed, direction, solar irradiance) whereas the GAIATron

Soil stations measures soil parameters (e.g. multi-depth soil

temperature, humidity). The coverage area for each station

varies and their spatial distribution is influenced by the

microclimatic variability of the monitored area.

Version 1.0

Initial Availability Date Beginning of 2016

Data Type Sensor measurements (numerical data) and metadata

(timestamps, sensor id, etc.)

Page 92: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 92

Personal Data No personal data is being recorder and/or stored

Rightsholder NP

Dataset/API Owner/Responsible NP

Dataset/API Owner/Responsible

Contacts

[email protected]

Technology NODEJS, Python, Apache, Linux, MySQL, JSON

Name of the System GAIAtrons (IoT telemetry stations for in-field measurements

collection)

GAIABus DataSmart RealTime Subcomponent (for cloud-based

monitoring, validation, parsing and cross-checking of the incoming

data streams)

Dataset Data Model/API

Interface

Data Model: Standards,

Glossaries and metadata

standards

No standards are being used in glossaries and metadata

Data Identifier - Standard used No standards are being used

Data Model - Specific Data

Model

Custom data model that is designed to optimally address the needs

of the offered smart farming applications

Data Volume several GBs/year

Page 93: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 93

Update Frequency The update frequency depends on the velocity of the incoming

data streams. GAIAtrons offer configurable data collection and

transmission rates, per station and monitored parameter, based

on the needs of the application

Data Archiving and preservation Data is preserved in local warehouses

Geographical Coverage Greek Pilot Areas (DataBio Pilots A1.1, B1.2, C1.1, C2.2)

Timespan 2016 until now

Access Level Restricted

Access Mechanism Query

5.2.4 Land use and properties - Greek agriculture pilots (NP - D13.02)

Anonymised IACS data

Field Value

Internal Name of the Dataset D13.02

Name of the Dataset/API

Provider 1.1.1 5.3.13 Land use and properties - Greek agriculture pilots

Short Description Dataset comprised of agricultural parcel positions expressed in

vectors along with several attributes and extracted multi-

temporal vegetation indices associated with them.

Extended Description Dataset comprised of thousands of agricultural parcel positions

expressed in vectors along with several attributes including

Page 94: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 94

cultivating crop types, variety codes, and description. Further,

each object/parcel has been assigned with several extracted

statistical descriptors of different vegetation indices such as

NDVI, NDWI and SAVI that capture its status in various temporal

instances.

Version 1.0

Initial Availability Date Beginning of 2016

Data Type Parcel Geometries (WKT), alphanumeric parcel-related data and

metadata (e.g. timestamps)

Personal Data The dataset has been pseudonymized and the most revealing

fields within a data record (farmers’ identifiers) have been

replaced by artificial identifiers (parcel id). The

pseudonymization of the data allows the data to be tracked to

its origins, as the goal is to provide smart farming services to the

farmers, however, by following this process personal data can

no longer be attributed to a specific data subject without the

use of additional information. Fully aligned with the new GDPR,

NP keeps the additional information separately and all technical

and organizational measures have been established, ensuring

that the personal data are not directly attributed to an identified

or identifiable natural person.

Rightsholder NP

Dataset/API Owner/Responsible NP

Dataset/API Owner/Responsible

Contacts

[email protected]

Page 95: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 95

Technology PostgreSQL, Python

Name of the System

Dataset Data Model/API

Interface

Data Model: Standards,

Glossaries and metadata

standards

No standards are being used in glossaries and metadata

Data Identifier - Standard used No standards are being used

Data Model - Specific Data

Model

Custom data model that is designed to optimally address the needs

of the offered smart farming applications

Data Volume several GBs/year

Update Frequency Periodically. The update frequency depends on the velocity of

the incoming EO data streams and the assignment of vegetation

indices statistics to each parcel. Currently, new Sentinel-2

products are available every 5 days approximately and the

dataset is updated in regular intervals

Data Archiving and preservation

Geographical Coverage Several areas within the Greek territory, including DataBio Pilots

A1.1, B1.2, C1.1 and C2.2.

Timespan 2016 until now

Page 96: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 96

Access Level Restricted

Access Mechanism Query

URI

5.2.5 Customer and forest estate data (METSAK - D18.02)

Field Value

Internal Name of the Dataset D18.02

Name of the Dataset/API

Provider

Customer and forest estate data / Metsään.fi

Short Description The forest resource data is connected with the customer

and forest estate data of METSAK. The essential part of the

Metsään.fi eService use is the information on who owns

certain forest estates and who has the rights to read and to

use the forest resource data of a certain forest owner. The

pilot uses METSAK’s customer information system, which

contains all this data.

Version XML-file versions 1.4, 1.5, 1.6, 1.7

Initial Availability Date Year 2012 onwards

Data Type Relational database

Personal Data Private Forest Owners and Forest Service Providers

Rightsholder METSAK

Page 97: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 97

Dataset/API Owner/Responsible Metsään.fi data / Anu Kosunen

Dataset/API Owner/Responsible

Contacts

XML data / Anu Kosunen/[email protected]

Technology XML writer provides the standardized XML data from the Forest

Resource Database

Name of the System Metsään.fi

Dataset Data Model/API

Interface

Metsään.fi user interface, Web Service and SOAP interfaces on the

back ground.

Data Model: Standards,

Glossaries and metadata

standards

XML standards

https://www.metsatietostandardit.fi/en/

Data Identifier - Standard used XML

Data Model - Specific Data

Model

https://www.bitcomp.fi/metsatietostandardit/

Data Volume 450 GB

Update Frequency Constant updates when needed.

Data Archiving and preservation N/A

Geographical Coverage Finland

Page 98: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 98

Timespan from 2012 onwards

Access Level Registered users: Private Forest Owners and Forest Service

Providers

Access Mechanism https://tunnistaminen.suomi.fi

URI https://www.metsaan.fi/

5.3 New Datasets created during DataBio

5.3.1 Canopy height map (FMI - D14.05)

Field Value

Internal Name of the

Dataset

D14.05

Name of the Dataset/API

Provider

Canopy height map

Short Description Stand age (growth stages) according to canopy height model

derived from aerial stereo-orthophoto interpretation of Czech

Land Survey (data available countrywide every second year).

Spatial resolution 5 m. Distinguished 4 different growth stages

and absolute canopy height.

Extended Description Canopy height map, 20m resolution, pixel value corresponds to the

height of dominant tree species

Initial Availability Date Will be prepared in Q3 2017

Data Type GeoTiff

Page 99: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 99

Rightsholder Property of FMI

Dataset/API

Owner/Responsible

Raster dataset

Dataset/API

Owner/Responsible

Contacts

[email protected]

Data Volume 4 GB

Update Frequency Fixed

Geographical Coverage Czech Republic

Timespan 2017

5.3.2 Orthophotos - (IGN - D11.02)

Field Value

Internal Name of the Dataset D11.02

Name of the Dataset/API

Provider

IGN Ortophotos

Short Description Orthophotos provided by Spanish National Geographic

Institute.

Page 100: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 100

Extended Description Multiples geographical areas and various times.

Orthophotos from PNOA (Spanish coverage). RGB&NIR

bands. GSD= 25 cm. RMSE < 0,5 m

Initial Availability Date 2006

Data Type Images (WMS, PNG…)

Personal Data No

Rightsholder IGN. License CC-BY.

Dataset/API Owner/Responsible IGN

Dataset/API Owner/Responsible

Contacts

[email protected]

Name of the System Sentinel

Dataset Data Model/API

Interface

REST

Data Volume TB

Update Frequency Yearly

Geographical Coverage Whole Spanish Surface

Timespan 2016 - End of the Project

Page 101: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 101

Access Level Free

URI http://centrodedescargas.cnig.es/CentroDescargas/catalogo

.do#selectedSerie

5.3.3 GEOSS sources (D11.03)

Field Value

Internal Name of the Dataset D11.03

Name of the Dataset/API

Provider

GEOSS Sources

Data Type EO data

Dataset/API Owner/Responsible TRAGSA-TRAGSATEC

Dataset/API Owner/Responsible

Contacts

[email protected]

Name of the System GEOSS

5.3.4 RPAS data (Tragsa - D11.04)

Field Value

Internal Name of the Dataset D11.04

Page 102: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 102

Name of the Dataset/API

Provider

RPAS

Short Description RPAS data and Images

Extended Description RGB & Multispectral (6 bands: RGB+Red Edge + NIR) &

Thermal & point-cloud. Spatial features TBD according to

the pilot needs

Version

Initial Availability Date October 2017

Data Type Images: TIFF and JPEG

Personal Data No

Rightsholder Under agreement. Property of TRAGSA Group

Other Rights Information N/A

Dataset/API Owner/Responsible TRAGSA Group

Dataset/API Owner/Responsible

Contacts

[email protected]

Page 103: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 103

Dataset Data Model/API

Interface

No API interface. Files in local folders.

Data Identifier - Standard used TIFF. JPEG.

Data Model - Specific Data

Model

.TIFF, .JPG, .LAS

Data Volume 60 Gb

Update Frequency 1-2 times year.

Data Archiving and preservation Stored in TRAGSA Premises

Geographical Coverage Small parcels within pilots areas. Hectares.

Timespan Meeting pilot needs.

Access Level Private.

Access Mechanism Under Request.

URI No.

5.3.5 MFE Spanish Forest Map (D11.06)

Field Value

Page 104: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 104

Internal Name of the Dataset D11.06

Name of the Dataset/API

Provider

MFE50

Short Description Mapa Forestal Españolo (MFE) - Spanish Forestry Map

Extended Description MAPAMA (Spanish Ministry of Agriculture, Fisheries and

Environment)

Initial Availability Date From 1997

Data Type ESRI Shapefile

Personal Data No

Rightsholder Free

Other Rights Information MAPAMA (Spanish Ministry of Agriculture, Fisheries and

Environment)

Dataset/API Owner/Responsible TRAGSA-TRAGSATEC

Dataset/API Owner/Responsible

Contacts

[email protected]

Name of the System MFE (Spanish Forest Map)

Page 105: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 105

Dataset Data Model/API

Interface

http://www.mapama.gob.es/es/biodiversidad/servicios/banco-

datos-naturaleza/informacion-

disponible/mfe50_descargas_comunidad_madrid.aspx

Data Model: Standards,

Glossaries and metadata

standards

Cartography, vectors

Data Model - Specific Data

Model

ESRI Shape File

Data Volume ~Mb

Update Frequency Every 10 years

Geographical Coverage Spain

Timespan From 1997, updated every 10 years

Access Level Open Access, Specific license not defined

5.3.6 Field data - pilot B2 (Tragsa - D11.07)

Field Value

Internal Name of the Dataset D11.07

Name of the Dataset/API

Provider

Field data - pilot B2

Page 106: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 106

Short Description Data acquired by IoT Sensors. Scientific data from field

samples.

Extended Description Direct observations and Direct & Lab measurements:

Chlorophyll content, morphology, green & dry weight,

hydric potential, Leaf Area Index (LAI), visual classification

of damages. Features TBD according to the pilot needs

Rightsholder Under agreement. Property of TRAGSA Group

Dataset/API Owner/Responsible

Contacts

[email protected]

Data Identifier - Standard used CSV

Data Volume ~Mb

Update Frequency Daily

Data Archiving and preservation TRAGSA Premises

Geographical Coverage Study sites TBD in: Extremadura, Galicia

Timespan Specific dates TBD according to the pilot needs: 2017-2019

Access Level TRAGSA-TRAGSATEC

Page 107: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 107

5.3.7 Forest damage (FMI - D14.07)

Field Value

Internal Name of the Dataset D14.07

Name of the Dataset/API

Provider

Forest damage

Short Description In-situ observations of forest damage. FMI. Czech Republic.

Forestry statistics for selected plots - information about

the amount of salvage cutting.

Extended Description Derived from Wuudis mobile application

Initial Availability Date 2017

Data Type Photography, numeric values

Rightsholder FMI

Dataset/API Owner/Responsible

Contacts

[email protected]

Name of the System

Dataset Data Model/API

Interface

SQL, REST

Page 108: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 108

Data Model: Standards,

Glossaries and metadata

standards

GeoTiff, CSV

Data Volume Gigabytes

Update Frequency Based on field campaigns

Geographical Coverage Czech Republic

5.3.8 Open Forest Data (METSAK - D18.01)

Field Value

Internal Name of the Dataset D18.01

Name of the Dataset/API

Provider

Open Forest Data / METSAK

Short Description The pilot uses METSAK’s forest resource data concerning

privately owned Finnish forests from METSAK’s forest

resource data system. The forest resource data consists of

basic data of tree stands (development class, dominant

tree species, scanned height, scanned intensity, stand

measurement date), strata of tree stands (mean age, basal

area, number of stems, mean diameter, mean height, total

volume, volume of logwood, volume of pulpwood), growth

place data (classification, fertility class, soil type, drainage

state, ditching year, accessibility, growth place data source,

growth place data measurement date), geometry and

compartment numbering. The forest resource data is

Page 109: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 109

available in a standard format for external use with

consent of a forest owner.

Extended Description The forest resources are invented once in a decade per

certain area using remote sensing (airborne laser scanning)

and aerial photographs. The new data is analysed and in

some parts measured in the field. Other updates on the

forest resource data are yearly growth calculations, possible

notifications of forest use or other forestry operations or so

called Kemera financing operations and possible new aerial

photographs to be interpreted.

Version OGC GeoPackage with 1.2 RTree

XML version 1.7

Initial Availability Date 1.3.2018 Download services, Q2/2018 API’s

Data Type Open forest data including forest resource data as well as GIS data

Personal Data N/A

Rightsholder METSAK

Dataset/API Owner/Responsible METSAK Open forest data/ Juha Inkilä

Page 110: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 110

Dataset/API Owner/Responsible

Contacts

METSAK Open forest data (Avoin metsätieto)/METSAK /

[email protected]

Technology WMS, WFS and REST

Name of the System Open forest data (Avoin metsätieto)

Dataset Data Model/API

Interface

XML standard/REST,

OGC GeoPackage standard / WFS, WMS from Oracle

database

Data Model: Standards,

Glossaries and metadata

standards

https://www.metsatietostandardit.fi/en/

Data Identifier - Standard used XML, OGC, WFS, WMS, REST

Data Model - Specific Data

Model

https://www.bitcomp.fi/metsatietostandardit/

Data Volume 276,8 GB on June 2018

Update Frequency Daily

Geographical Coverage Finland

Timespan From 1.3.2018 onwards

Access Level Open

URI https://www.metsaan.fi/rajapinnat

Page 111: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 111

5.3.9 Hyperspectral image orthomosaic (Senop - D44.02)

Field Value

Internal Name of the Dataset D44.02

Name of the Dataset/API

Provider

Hyperspectral image orthomosaic

Short Description Orthorectified hyperspectral mosaic, n-bands, band-

matched.

Data Type ENVI /multipage TIF/single band TIF

5.3.10 Leaf area index (FMI - D14.06)

Field Value

Internal Name of the Dataset D14.06

Name of the Dataset/API

Provider

Leaf area index

Short Description Leaf area index and canopy closure for selected National

forest inventory sites in Czech Republic. Based on

interpretation of digital hemispherical photos (in total 2457

images collected for 189 sites). Provided as input

hemispherical photos and vector point layer with centroid

of forest plot and LAI values in attribute table.

Page 112: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 112

Extended Description In-situ sampling of DHP was based on the scheme, which

takes into account the Sentinel-2 satellite spatial resolution

(20 m pixel size) while the number of photos and their

spatial layout was selected according to Majasalmi et al.

(2012) as star shape with 13 sampled points in four principal

azimuths - north, south, east and west. Each sampled point

was positioned 3 meters apart. Sampling scheme for digital

hemispherical photography.

The images were taken with a Nikon D5500 digital SLR

camera with a Sigma 4.5 mm circular fisheye lens. The

camera was placed on a Vanguard Espod CX203 AP tripod

and aligned horizontally with a two-axis level. All photos

were shot with lens facing north and taken as RAW

uncompressed images. In total 189 forest plots were

sampled, from which 79 stands were dominated by

coniferous trees (42% of the samples) and 110 stands with

the dominant presence of deciduous trees (58% of all

samples). All field plots were visited during the period of

maximum vegetation foliage, for 2016 and 2017 in June to

August, while in 2015 was the test period, where photos

were taken only for evergreen coniferous plots, mostly in

October.

All DHP photos were analysed in Hemisfer software (WSL,

Switzerland). The software uses the LAI value inversion

from angular distribution of canopy gaps for a set of

statistically representative set of images.

Version 1.0

Initial Availability Date 1.1.2018

Page 113: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 113

Data Type Image, numeric values

Rightsholder Property of FMI

Dataset/API Owner/Responsible FMI

Dataset/API Owner/Responsible

Contacts

[email protected]

Technology Digital hemispherical photography

Data Model: Standards,

Glossaries and metadata

standards

GeoTiff, CSV

Data Volume Approx 10 GB

Update Frequency Based on field campaigns, three dedicated field campaigns

conducted in 2015, 2016 and 2017

Data Archiving and preservation Local file storage

Geographical Coverage Czech Republic

Timespan 2015-2017

Page 114: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 114

5.3.11 NASA CMR Landsat Datasets via FedEO Gateway (SPACEBEL - D07.02)

Field Value

Internal Name of the Dataset D07.02

Name of the Dataset/API

Provider

NASA CMR Landsat Datasets via FedEO Gateway

Short Description All datasets and collections metadata (including Landsat-8

collections) provided by the NASA Common Metadata

Repository (CMR), around 32000 collections, are accessible

through an OGC 13-026r8 OpenSearch interface via the

FedEO Gateway

Extended Description All datasets and collections metadata (including Landsat-8

collections) provided by the NASA Common Metadata

Repository (CMR), around 32000 collections, are accessible

through an OGC 13-026r8 OpenSearch interface via the

FedEO Gateway (C07.01). The available geographical area

and the temporal coverage for the datasets/products are

specified in each collection metadata. In the case of Landsat-

8, the coverage is the global world starting on April 2013. To

download Landsat-8 products, an account is needed on

EROS Registration System (ERS) at the following URL

https://ers.cr.usgs.gov/register/. The download URL is

included in the catalog search response.

Collections and then products metadata including the

product download URL metadata can be accessed via the

component C07.05 FedEO Portlet acting as client of the

FedEO Gateway (C07.01). The following picture illustrates

the retrieval of Landsat-8 datasets through the FedEO

Portlet (C07.05).

Initial Availability Date April 2013

Page 115: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 115

Dataset/API Owner/Responsible NASA/USGS - access point via Spacebel/ESA FedEO Gateway to

collections including Landsat-8

Dataset/API Owner/Responsible

Contacts

[email protected], [email protected]

Dataset Data Model/API

Interface

Mission/collection specific. Product metadata returned is OGC 10-

157r4 compliant. Metadata contains download URL .OGC 13-026r8

OpenSearch.

Geographical Coverage Global

Timespan Landsat-8 Starts 2013-04, other collections have other temporal

extents which can be found in the metadata and Atom dc:date

elements.

Access Mechanism Accessible through an OGC 13-026r8 OpenSearch interface

via the FedEO Gateway. To download Landsat-8 products,

an account is needed on EROS Registration System (ERS) at

the following URL https://ers.cr.usgs.gov/register/. The

download URL is included in the catalog search response.

Requires having a username and password at Sentinels

Scientific Data Hub which is to be used inside the

OpenSearch request to the FedEO Gateway

(geo.spacebel.be).

5.3.12 Ontology for (Precision) Agriculture (PSNC -D09.01)

Field Value

Page 116: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 116

Internal Name of the Dataset D09.01

Name of the Dataset/API

Provider

Gateway Ontology for (Precision) Agriculture

Short Description The (FOODIE) ontology enables the representation of data

compliant with FOODIE data model in semantic format and

their interlinking with established vocabularies and

ontologies (e.g., AGROVOC).

Extended Description Thus, in line with FOODIE data model, different

agricultural-related concepts can be described and

represented, including agricultural facilities, crop and soil

data, treatments, interventions, agriculture machinery, etc.

Also, in line with FOODIE data model, the ontology is based

on the INSPIRE directive, ISO standards (e.g. 19156, 19157)

and OGC standards. The ontology can be used for different

semantic tasks, such as data semantization for the

transformation of (semi-)structured data (e.g., tabular,

relational) to semantic format; ontology-based data

access, e.g., accessing relational databases as virtual, read-

only RDF graphs; publication of linked data, including the

discovery of links with relevant datasets in the Linked Open

Data cloud.

Rightsholder Creative Commons Attribution 3.0

Dataset/API Owner/Responsible

Contacts

[email protected]

Page 117: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 117

Dataset Data Model/API

Interface

SPARQL

Data Model: Standards,

Glossaries and metadata

standards

OWL

Data Volume Dataset 100Kb

Geographical Coverage Agnostic

Timespan Agnostic

5.3.13 Open Land Use (Lespro - D02.01)

Field Value

Internal Name of the Dataset D02.01

Name of the Dataset/API

Provider

Open Land Use

Short Description Open Land Use Map is a composite map that is intended to

create detailed land-use maps of various regions based on

certain pan-Europen datasets such as CORINE Landcover,

UrbanAtlas enriched by available regional data.

The dataset is derived from available open datasources at

different levels of detail and coverage. These data sources

include:

1) Digital cadastral maps if available

2) Land Parcel Identification System if Available

3) Urban Atlas(European Environmental Agency)

Page 118: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 118

4) CORINE Land Cover 2006 (European Environmental

Agency)

5) Open Street Map

The order of the data sources is according to the level of

detail and, therefore, the priority for data integration.

Page 119: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 119

Extended Description

The Open Land Use (OLU) data model joins two basic data

models of the INSPIRE Land Use specification – existing

land use and planned land use. The main difference among

INSPIRE data models and OLU model has been caused by

the fact that OLU data model connects planning and

existing land use data. In the OLU the different attributes

are used for both types of land use data.

Land use involves management and modification of natural

environment or wilderness into built environment such as

fields, pastures, and settlements. It also has been defined as

"the arrangements, activities and inputs people undertake

in a certain land cover type to produce, change or maintain

it" (FAO, 1997a; FAO/UNEP, 1999). Land use practices vary

considerably across the world. The United Nations' Food and

Agriculture Organization Water Development Division

explains that "Land use concerns the products and/or

benefits obtained from use of the land as well as the land

management actions (activities) carried out by humans to

produce those products and benefits." The OLU model also

follows INSPIRE land use specification (uses same data

attributes; the set of used attributes is larger than in the case

of Land Use Database Schema), but it works with more

simple view on data. Both models are transformable to each

other and it is also possible to migrate data from these

model to or from other datasets that are in harmony with

INSPIRE specification. The main reason for above-

mentioned differences is determine by different usage of

data and data models. OLU will be used for any land use (and

land cover) data, Land Use Database Schema serves just to

spatial planning data as a special part of land use data. There

are several datasets which could be used for creating

harmonised land use dataset. Land use is a dataset which is

used in many specialisms including agriculture, spatial or

urban planning, environment protection and maintenance

and restoration of environmental functions.Currently Open

Land Use cover all EU with different level of accuracy:

Europe

Page 120: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 120

The base European dataset is derived from the set of

available data sources that are helping identify the land use

in particular locality. The list of the sources used so far on

the Pan-European level includes:

1. Urban Atlas

2. CORINE Land Cover 2012

The sources are mentioned in the order they were combined

(1 - has the highest geometrical and semantic precedence

and so on) to create the map.

Czech Republic

The dataset is derived from the set of available data

sources that are helping identify the land use in particular

locality. The list of the sources used so far includes:

1) Digital Cadastre

2) LPIS (Land Parcel Identification System)

3) Urban Atlas

4) CORINE Land Cover

Austria

The dataset is derived from the set of available data

sources that are helping identify the land use in particular

locality. The list of the sources used so far includes:

1) LPIS (Land Parcel Identification System)

2) Urban Atlas

3) CORINE Land Cover

Flanders

The dataset is derived from the set of available datasources

that are helping identify the landuse in particular locality.

The list of the sources used so far includes:

1) GRBGis Large Scale Reference Database

2) Urban Atlas

3) CORINE Land Cover

Page 121: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 121

Version

Initial Availability Date 2015

Rightsholder Plan4all

Dataset/API

Owner/Responsible

LESP

Dataset/API

Owner/Responsible Contacts

[email protected]

Technology GML

Name of the System Open Land Use

Dataset Data Model/API

Interface

REST, OGC WMS, WFS

Data Model: Standards,

Glossaries and metadata

standards

GML

Data Volume Hundreds of GB

Update Frequency Semi annually

Geographical Coverage Europe

Timespan 2015 - present

Page 122: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 122

URI Open Land Use is available on

http://sdi4apps.eu/open_land_use/

5.3.14 Phenomics, metabolomics, genomics and environmental datasets (CERTH -

DS40.01)

Field Value

Internal Name of the Dataset DS40.01

Name of the Dataset/API

Provider

Phenomics, metabolomics, genomics and environmental

datasets

Short Description This dataset includes phenomics, metabolomics, genomics

as well as environmental data. Genomic predictions and

selection data are also there.

Data Type Raw text, CSV data

Dataset/API Owner/Responsible

Contacts

[email protected], [email protected]

Data Volume 1-12 MB

Geographical Coverage Regions of Thessalia

5.3.15 Quality control data (METSAK - D18.04)

Field Value

Page 123: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 123

Internal Name of the Dataset D18.04

Name of the Dataset/API

Provider

Quality control data

Short Description The quality control data consists of forest estate, number of

the financing conclusion, geometry of compartments type

of the forest work, sample plot locations, measured data per

sample plot, measurement averages per compartment,

measurement date and user information. The quality

control data will be added to the existing forest data

standard during 2017.

Extended Description The quality control data of the work done in forest is part of

the Best Practice Guidelines for Forest Management. The

data is already being collected and saved in METSAK’s

information systems, but the amount of that data needs to

be increased. The data is planned to be collected also

through a mobile application.

This pilot is about presenting the quality control data in

Metsään.fi eService for forest owners and forestry

operators, and supporting the requirement specification of

a new mobile application and its interfaces. In Metsään.fi

the forest owners should be able to follow the quality of

work done in their forests and compare it to the national

average. The forestry operators have the quality data of

their own work done in forest in Metsään.fi and also the

possibility to compare it to the national average.

Version v1.0.0

Initial Availability Date Q3/2018

Data Type Quality control data for young stand improvement and tending of

seedling stands.

Page 124: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 124

Personal Data End user information and personal data.

Rightsholder METSAK

Dataset/API Owner/Responsible Mobile app. dataset owner MHGS/ Seppo Huurinainen

Dataset/API Owner/Responsible

Contacts

METSAK forest resource database (KantoRiihi) / Aki Hostikka /

[email protected]

Technology Mobile app. in JSON, Quality control data in XML, SOAP, REST

Name of the System Laatumetsä mobile app., METSAK Forest Resource DataBase

(KantoRiihi)

Dataset Data Model/API

Interface

Laatumetsä Mobile app. user interface,

REST

SOAP

METSAK Forest Resource Database (KantoRiihi)

Data Model: Standards,

Glossaries and metadata

standards

REST, SOAP, JSON, XML

https://www.metsatietostandardit.fi/en/

Data Identifier - Standard used XML - https://www.metsatietostandardit.fi/en/

Data Model - Specific Data

Model

https://www.bitcomp.fi/metsatietostandardit/

Page 125: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 125

Data Volume Expected to be 200 GB together with Storm and Forest Damages

dataset

Update Frequency Online

Data Archiving and preservation METSAK Forest Resource Database (KantoRiihi)

Geographical Coverage Finland

Timespan From Q3/2018 onwards

Access Level Available for registered users.

Access Mechanism https://tunnistaminen.suomi.fi

URI https://www.wuudis.com/fi/laatumetsa/

5.3.16 Sentinels Scientific Hub Datasets via FedEO Gateway (SPACEBEL -D07.01)

Sentinel Products available on the Sentinels Scientific Data Hub (Sentinel-1, Sentinel-2) can

be discovered and accessed via the FedEO Gateway (C07.01) that returns Sentinel collections

and datasets metadata (including product download URL) via an OGC 13-026r8 OpenSearch

interface. The available geographical area is the global world and the temporal coverage starts

on April 2014 for Sentinel-1 and June 2015 for Sentinel-2. The access to the datasets metadata

and the products requires an account (user/password) that can be obtained at

https://scihub.copernicus.eu/dhus/#/self-registration. Access to Sentinel Products and

metadata information can be done via the user interface of the FedEO Portlet (C07.05).

Field Value

Internal Name of the Dataset D07.01

Page 126: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 126

Name of the Dataset/API

Provider

Sentinels Scientific Hub Datasets via FedEO Gateway

Short Description Sentinel Products available on the Sentinels Scientific Data

Hub (Sentinel-1, Sentinel-2) can be discovered and accessed

via the FedEO Gateway (C07.01) that returns Sentinel

collections and datasets metadata (including product

download URL) via an OGC 13-026r8 OpenSearch interface.

The available geographical area is the global world and the

temporal coverage starts on April 2014 for Sentinel-1 and

June 2015 for Sentinel-2. The access to the datasets

metadata and the products requires an account

(user/password) that can be obtained at

https://scihub.copernicus.eu/dhus/#/self-registration.

Access to Sentinel Products and metadata information can

be done via the user interface of the FedEO Portlet (C07.05).

Extended Description All datasets (collections) available through the Sentinels Scientific

Hub are accessible through standard protocols via the Spacebel

component C07.01 FedEO Gateway. These collections include:

Sentinel-1, Sentinel 2, … Detailed collection information is

published by ESA/Spacebel in the FedEO Collection Catalog and can

be made available in various metadata flavours including ISO19139,

ISO19139-2, ISO MENDS, DIF-10 or visualised on a user interface.

Examples are shown below.

http://geo.spacebel.be/opensearch/request?uid=EOP:ESA:SENTINE

L_1,

http://geo.spacebel.be/opensearch/request?uid=EOP:ESA:S2MSI1C

Dataset/API

Owner/Responsible

Spacebel

Dataset/API

Owner/Responsible Contacts

[email protected], [email protected]

Page 127: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 127

Dataset Data Model/API

Interface

OGC 13-026r8 OpenSearch.

Geographical Coverage Global

Timespan Sentinel-1 Starts 2014-04 (see below), Sentinel-2 starts 2015-06

(See below), other collections have other temporal extents.

Access Mechanism Requires having a user name and password at Sentinels Scientific

Data Hub which is to be used inside the OpenSearch request to

the FedEO Gateway (geo.spacebel.be).

5.3.17 SigPAC (Tragsa - D11.05)

CAP Information System is a Land parcel identification system. It is provided by the Junta de

Castilla y Leon (Autonomic Government).

Field Value

Internal Name of the Dataset D11.05

Name of the Dataset/API

Provider

SigPAC

Short Description LPIS - Land parcel identificacion system.

Extended Description A land-parcel identification system (LPIS) is a system to identify

land use for a given country. It utilises orthophotos – basically

aerial photographs and high precision satellite images that are

digitally rendered to extract as much meaningful spatial

information as possible. A unique number is given to each land

parcel to provide a unique identification in space and time. This

information is updated regularly to monitor the evolution of the

land cover and the management of the crops.

Page 128: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 128

Initial Availability Date Starting date of the project

Data Type ESRI Shape, SQLITE Databases

Personal Data No

Rightsholder FEGA - CAP payment Agency in Spain

Dataset/API Owner/Responsible

Contacts

www.mapama.gob.es

Data Model: Standards,

Glossaries and metadata

standards

More information at:

https://ec.europa.eu/jrc/en/research-topic/agricultural-monitoring

Data Model - Specific Data

Model

There are some commonalities among the european countries but

LPIS model is different in each member state.

Data Volume Lower than 1Gb

Update Frequency Yearly

Geographical Coverage Spain

Access Level Free in some regions. Private in others.

5.3.18 Smart POI dataset (Lespro - D02.01)

Field Value

Page 129: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 129

Internal Name of the Dataset D02.01

Name of the Dataset/API

Provider

Smart POI dataset

Extended Description The Smart Points of Interest dataset (SPOI) is the seamless

and open resource of POIs that is available for all users to

download, search or reuse in applications and services

SPOI’s principal target is to provide information as Linked

Data together with other dataset containing road network.

The added value of the Smart approach in comparison to

other similar solutions consists in implementation of linked

data, using of standardized and respected datatype

properties and development of the completely harmonized

dataset with uniform data model and common

classification.

The SPOI dataset is created as a combination of global data

(selected points from OpenStreetMap) and local data

provided by the SDI4Apps partners or data available on the

web. The dataset can be reached by Sparql endpoint

(http://data.plan4all.eu/sparql), for detailed information

please follow: http://sdi4apps.eu/spoi.

Rightsholder It is available under Open Data Commons Open Database

License (ODbL ~

http://opendatacommons.org/licenses/odbl/)

5.3.19 Stand age map (FMI - D14.04)

Field Value

Page 130: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 130

Internal Name of the Dataset D14.04

Name of the Dataset/API

Provider

Stand age map

Short Description Vector layer based on Czech forest management plans and

stand age based on detailed forest inventory. It is

countrywide with 10 years update interval.

5.3.20 Storm and forest damage observations and possible risk areas (METSAK -

D18.03a)

Field Value

Internal Name of the Dataset D18.03a

Name of the Dataset/API

Provider

Storm and forest damage observations and possible risk

areas

Short Description One of the new data concerning this pilot is storm and

forest damage observations, which are planned to be

crowdsourced. The storm damage observations consist of

location, type of the damage, evaluation of the extent of

the damage, tree species and distance from the road. The

storm and forest damage data supplements forest

resource data. Possible storm and forest damage areas are

evaluated based on the damage observations collected.

The possible risk areas are presented to the users on a map

layer.

Page 131: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 131

Extended Description This information is currently gathered by METSAK with the

forest use declaration process. To improve the overall

management of storm damages and to prevent the

possible further damages it is extremely important to get

the field data and information as soon as possible. One

way to gather this type of information is to provide a

mobile app, which allows every (wo)man to report their

observations for the forestry experts at Finnish Forest

Centre. Based on the crowdsourced information forestry

experts are able to react faster the before, which can

prevent the further damages for instance caused by the

pest attacks. Also the damaged wood material could be

faster routed to the most suitable place for further

processing.

Version v1.0.0

Initial Availability Date Q4/2018

Data Type XML

Personal Data No personal data gathered

Rightsholder METSAK

Other Rights Information MHGS provides the mobile app for data collection

Dataset/API Owner/Responsible Mobile app. dataset owner MHGS/ Seppo Huurinainen

METSAK / Virpi Stenman

Page 132: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 132

Dataset/API Owner/Responsible

Contacts

METSAK forest damages database/Mikko Kesälä/

[email protected]

Technology Mobile app. in JSON, Storm and forest damages data in XML, SOAP,

REST

Name of the System Laatumetsä mobile app,

Mestakeskus map service

(https://metsakeskus.maps.arcgis.com/home/index.html)

Dataset Data Model/API

Interface

WMS-maps, XML standardization is on going.

METSAK user interface

Laatumetsä Mobile app. user interface,

REST

SOAP

Data Model: Standards,

Glossaries and metadata

standards

REST, SOAP, JSON, XML

https://www.metsatietostandardit.fi/en/

Data Identifier - Standard used XML, OGC and WMS-maps

XML - https://www.metsatietostandardit.fi/en/

Data Model - Specific Data

Model

https://www.bitcomp.fi/metsatietostandardit/

Page 133: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 133

Data Volume Expected to be 200 GB together with Quality Control dataset

(Laatumetsä)

Update Frequency Online

Data Archiving and preservation The data is stored and backuped in METSAK map service database

Geographical Coverage Finland

Timespan Q4/2018 onwards

Access Level open

Access Mechanism open

URI Mobile app: https://www.wuudis.com/fi/laatumetsa/

METSAK map service:

https://metsakeskus.maps.arcgis.com/home/index.html

5.3.21 Forest road condition observations (METSAK - D18.03b)

Field Value

Internal Name of the Dataset D18.03b

Name of the Dataset/API

Provider

Forest road condition observations / Roads.ML

Page 134: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 134

Short Description One of the new data concerning this pilot is forest road

condition observations, which are planned to be

crowdsourced. The forest road condition observations

consist of location, type of the road based on digiroad

map, evaluation of the condition of the road, possible road

limitations or obstacles on the road as well as the forest

development classes for the road surroundings. The road

and forest felling potential data supplements open forest

data forest resource data. In future, possible priorities in

road improvement activities might be evaluated based on

the road condition observations collected. Both, the

observed condition of the road and related felling potential

are presented to the users on a map layer, which is openly

available.

Extended Description This information is not currently gathered by METSAK. To

increase the knowledge regarding the current road

network condition and availability is extremely important

for the logistic chain of the forest industry as well as for

ensuring the wood supply. The crowdsourcing i.e. a mobile

app can be utilized for collecting the field data and

information as soon as possible. Based on the

crowdsourced information forestry experts within the

forest industry sector are able to react faster than before,

which can prevent possible hiccups in the wood supply

chain.

Version v1.0.0

Initial Availability Date Q4/2018

Data Type WMS

Page 135: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 135

Personal Data No personal data gathered

Rightsholder METSAK

Other Rights Information Roads.ML provides the mobile app for data collection

Dataset/API Owner/Responsible Mobile app. dataset owner Roads.ml/ Jussi-Pekka Martikainen

METSAK map service / Mikko Kesälä

Dataset/API Owner/Responsible

Contacts

METSAK forest road map/Mikko Kesälä/

[email protected]

Technology Mobile app. in PostGres, Forest road data provided as GIS interface

Name of the System Roads.ml mobile app, Mestakeskus map service (

https://metsakeskus.maps.arcgis.com/home/index.html)

Dataset Data Model/API

Interface

WMS-maps. METSAK user interface for WMS map.

Roads.ml Mobile app. user interface,

REST

Data Model: Standards,

Glossaries and metadata

standards

REST, PostGres

Page 136: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 136

Data Identifier - Standard used OGC and WMS-maps

Data Model - Specific Data

Model

Based OGC standard

Data Volume Expected to be around 20 GB

Update Frequency Online

Data Archiving and preservation Postgres database

Geographical Coverage Finland

Timespan Q4/2018 onwards

Access Level open

Access Mechanism open

URI Mobile app: www.roads.ml

METSAK map service:

https://metsakeskus.maps.arcgis.com/home/index.html

5.3.22 Tree species map (FMI - D14.03)

Field Value

Internal Name of the Dataset D14.03

Page 137: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 137

Name of the Dataset/API

Provider

Tree species map

Short Description Tree species map. Raster dataset based on classification of

Sentinel-2 multi-temporal data and National forest

inventory of Czech Republic. 20 m spatial resolution,

distinguished six most abundant tree species in Czech

Republic.

Data Type Raster dataset

Rightsholder Property of FMI

Dataset/API Owner/Responsible

Contacts

lukes.petr@@uhul.sz

Data Model: Standards,

Glossaries and metadata

standards

GeoTiff

Data Volume 1 Gb

Update Frequency Fixed

Geographical Coverage Czech Republic

Timespan 2017

Page 138: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 138

5.3.23 Wuudis data (MHGS - D20.01)

Wuudis uses the Finnish forest information standard as basic data import/export format.

Wuudis service data model is based on the Finnish forest information standard. All

development activities during the DataBio project that will affect to the Wuudis data model

are based on Finnish forest information standard. Forest information standard includes a set

of different standardized schemas (like timber sales, logistics etc.). Some of these schemas

can be used in the DataBio and some new specifications are developed during project.

Basic information about the forest information standard: http://www.metsatietostandardit.fi/en

. Base forest information standard XML schema description can be found here:

https://extra.bitcomp.fi/metsastandardi_ehdotus/V8/MV/doc/index.html . This schema includes

basic forest property data, stands, operations, tree stratums. Everything is based on this basic

real estate information. Whole schema repository can be found here: https://www.bitcomp.fi/metsatietostandardit/

Wuudis also has open REST API that uses plain JSON which is faster than standard based XML

data transfer. With JSON interface different kind of query parameters can be also used and

data can be fetched in parts (like single stand or operation). All available resources are listed

in the WADL documentation: https://wuudis.com/api/application.wadl

One important dataset for Wuudis is different map layers. Wuudis uses global map services

like Google and Microsoft (Bing) to provide world-wide satellite map layers to the end users.

Wuudis also provides map layers from National Land Survey of Finland’s WMS/WMTS service.

More information about National Land Survey of Finland map services can be found here: http://www.maanmittauslaitos.fi/en/maps-and-spatial-data/maps/view-maps .

Field Value

Internal Name of the Dataset D20.01

Name of the Dataset/API

Provider

Wuudis data

Short Description Wuudis uses the Finnish forest information standard as basic

data import/export format. Wuudis service data model is

based on the Finnish forest information standard. All

development activities during the DataBio project that will

Page 139: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 139

affect to the Wuudis data model are based on Finnish forest

information standard

Extended Description Forest information standard includes a set of different

standardized schemas (like timber sales, logistics etc.). Some

of these schemas can be used in the DataBio and some new

specifications are developed during project.

Basic information about the forest information standard:

http://www.metsatietostandardit.fi/en . Base forest information

standard XML schema description can be found here:

https://extra.bitcomp.fi/metsastandardi_ehdotus/V8/MV/doc/ind

ex.html . This schema includes basic forest property data,

stands, operations, tree stratums. Everything is based on this

basic real estate information. Whole schema repository can

be found here: https://www.bitcomp.fi/metsatietostandardit/

Wuudis also has open REST API that uses plain JSON which is

faster than standard based XML data transfer. With JSON

interface different kind of query parameters can be also used

and data can be fetched in parts (like single stand or

operation). All available resources are listed in the WADL

documentation: https://wuudis.com/api/application.wadl

One important dataset for Wuudis is different map layers.

Wuudis uses global map services like Google and Microsoft

(Bing) to provide world-wide satellite map layers to the end

users. Wuudis also provides map layers from National Land

Survey of Finland’s WMS/WMTS service. More information

about National Land Survey of Finland map services can be

found here: http://www.maanmittauslaitos.fi/en/maps-and-

spatial-data/maps/view-maps .

5.4 Recommended interaction structures: ATOS As presented in previous sections in this document, each of Databio’s pilots require a

heterogeneous set of datasets that are made available in different remote systems, formats,

encodings as well as spatial and temporal resolutions.

Page 140: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 140

This section exemplary describes how some of the most commonly used datasets in the pilots

are managed/used by Databio’s components in an interoperable manner by making use of

standardized interfaces protocols (APIs).

Datas

et

name

:

DATASET NAME

Pilot: A1 /3.2.1 Oceanic tuna fisheries immediate operational choices

Comp

onent

:

C05.01 Rasdaman

API/O

perati

on:

OGC WCS - GetCoverage

Exam

ple:

Retrieve a subset area, encoded as GML, from the variable [variable name] covering the whole Indian Ocean for a specific date. Request: http://150.254.165.231:8080/rasdaman/ows?&SERVICE=WCS&VERSION=2.0.1&REQUEST=GetCoverage&COVERAGEID=mlotst&SUBSET=Lat(13.41,14.82)&SUBSET=Long(76.67,78.14)&SUBSET=ansi(%222018-06-26T00:00:00.000Z%22,%222018-06-26T00:00:00.000Z%22)&FORMAT=application/gml+xml Response: <gmlcov:ReferenceableGridCoverage

xmlns="http://www.opengis.net/gml/3.2"

xmlns:gml="http://www.opengis.net/gml/3.2"

xmlns:gmlcov="http://www.opengis.net/gmlcov/1.0"

xmlns:swe="http://www.opengis.net/swe/2.0"xmlns:wcs="http://www.opengi

s.net/wcs/2.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

gml:id="mlotst" xsi:schemaLocation="http://www.opengis.net/wcs/2.0

http://schemas.opengis.net/wcs/2.0/wcsAll.xsd">

<boundedBy>

<Envelope axisLabels="Lat Long ansi" srsDimension="3"

srsName="http://localhost:8080/def/crs-

compound?1=http://localhost:8080/def/crs/EPSG/0/4326&2=http://localh

ost:8080/def/crs/OGC/0/AnsiDate" uomLabels="degree degree d">

<lowerCorner>

Page 141: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 141

13.35467349552 76.618871415346 "2018-06-26T00:00:00.000Z"

</lowerCorner>

<upperCorner>

14.8527528809232 78.2007400554937 "2018-06-26T00:00:00.000Z"

</upperCorner>

</Envelope>

</boundedBy>

<domainSet>

<gmlrgrid:ReferenceableGridByVectors

xmlns:gmlrgrid="http://www.opengis.net/gml/3.3/rgrid" dimension="3"

gml:id="mlotst-grid"

xsi:schemaLocation="http://www.opengis.net/gml/3.3/rgrid

http://schemas.opengis.net/gml/3.3/referenceableGrid.xsd">

<limits>

<GridEnvelope>

<low>182 620 21</low>

<high>199 638 21</high>

</GridEnvelope>

</limits>

<axisLabels>Lat Long ansi</axisLabels>

<gmlrgrid:origin>

<Point gml:id="mlotst-origin"

srsName="http://localhost:8080/def/crs-

compound?1=http://localhost:8080/def/crs/EPSG/0/4326&2=http://lo

calhost:8080/def/crs/OGC/0/AnsiDate">

<pos>

14.811139564662 76.66049953745515 "2018-06-

26T00:00:00.000Z"

</pos>

</Point>

</gmlrgrid:origin>

<gmlrgrid:generalGridAxis>

<gmlrgrid:GeneralGridAxis>

<gmlrgrid:offsetVector srsName="http://localhost:8080/def/crs-

compound?1=http://localhost:8080/def/crs/EPSG/0/4326&2=http://

localhost:8080/def/crs/OGC/0/AnsiDate">-0.0832266325224 0

0</gmlrgrid:offsetVector>

<gmlrgrid:coefficients/>

<gmlrgrid:gridAxesSpanned>Lat</gmlrgrid:gridAxesSpanned>

<gmlrgrid:sequenceRule

axisOrder="+1">Linear</gmlrgrid:sequenceRule>

</gmlrgrid:GeneralGridAxis>

</gmlrgrid:generalGridAxis>

<gmlrgrid:generalGridAxis>

<gmlrgrid:GeneralGridAxis>

<gmlrgrid:offsetVector srsName="http://localhost:8080/def/crs-

compound?1=http://localhost:8080/def/crs/EPSG/0/4326&2=http://

localhost:8080/def/crs/OGC/0/AnsiDate">0 0.0832562442183

0</gmlrgrid:offsetVector>

<gmlrgrid:coefficients/>

<gmlrgrid:gridAxesSpanned>Long</gmlrgrid:gridAxesSpanned>

<gmlrgrid:sequenceRule

axisOrder="+1">Linear</gmlrgrid:sequenceRule>

</gmlrgrid:GeneralGridAxis>

</gmlrgrid:generalGridAxis>

Page 142: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 142

<gmlrgrid:generalGridAxis>

<gmlrgrid:GeneralGridAxis>

<gmlrgrid:offsetVector srsName="http://localhost:8080/def/crs-

compound?1=http://localhost:8080/def/crs/EPSG/0/4326&2=http://

localhost:8080/def/crs/OGC/0/AnsiDate">0 0

1</gmlrgrid:offsetVector>

<gmlrgrid:coefficients>"2018-06-

26T00:00:00.000Z"</gmlrgrid:coefficients>

<gmlrgrid:gridAxesSpanned>ansi</gmlrgrid:gridAxesSpanned>

<gmlrgrid:sequenceRule

axisOrder="+1">Linear</gmlrgrid:sequenceRule>

</gmlrgrid:GeneralGridAxis>

</gmlrgrid:generalGridAxis>

</gmlrgrid:ReferenceableGridByVectors>

</domainSet>

<rangeSet>

<DataBlock>

<rangeParameters/>

<tupleList cs=" " ts=",">

-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-

32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-

32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-

32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-

32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-

32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-

32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-

32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-

32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-

32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-

32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-

32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-

32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-

32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-

32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-

32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-

32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-

32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-

32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-

32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-

32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-

32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-

32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-

32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-

32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-

32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-

32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-

32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-

32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-

32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-

32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-

32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-

32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-

32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-

32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-

32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-

Page 143: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 143

32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-

32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-32767,-

32767

</tupleList>

</DataBlock>

</rangeSet>

<coverageFunction>

<GridFunction>

<sequenceRule axisOrder="+2 +1 +3">Linear</sequenceRule>

<startPoint>182 620 21</startPoint>

</GridFunction>

</coverageFunction>

<gmlcov:rangeType>

<swe:DataRecord>

<swe:field name="Gray">

<swe:Quantity xmlns:swe="http://www.opengis.net/swe/2.0">

<swe:label>Gray</swe:label>

<swe:nilValues>

<swe:NilValues>

<swe:nilValue reason="">-32767</swe:nilValue>

</swe:NilValues>

</swe:nilValues>

<swe:uom code="10^0"/>

</swe:Quantity>

</swe:field>

</swe:DataRecord>

</gmlcov:rangeType>

<gmlcov:metadata/>

</gmlcov:ReferenceableGridCoverage>

Dataset

name:

DATASET NAME

Pilot: A1 /3.2.1 Oceanic tuna fisheries immediate operational choices

Compone

nt:

C05.01 Rasdaman

API/Opera

tion:

OGC WCPS - ProcessCoverage

Example: Calculate the mean value from variable “mlotst” for the whole Indian Ocean for all time periods and return it as text.

Page 144: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 144

Request: http://150.254.165.231:8080/rasdaman/ows?&SERVICE=WCS&VERSION=2.0.1&REQUEST=ProcessCoverages& query=for $s in ( mlotst ) return encode( avg($s), "text/csv" ) Response:

38.206644819155315

Dataset

name:

DATASET NAME

Pilot: A1 /3.2.1 Oceanic tuna fisheries immediate operational choices

Compon

ent:

C05.01 Rasdaman

API/Oper

ation:

OGC WCS - ProcessCoverage

Example: Produce a colorized map (in png format) of the whole Indian Ocean Area depending on the values of the “mlotst” variable for a specific time period Request: http://150.254.165.231:8080/rasdaman/ows?&SERVICE=WCS&VERSION=2.0.1&REQUEST=ProcessCoverages& query=for $c in ( mlotst ) return encode(switch case $c[ansi("2018-05-30"), Lat(-35:30), Long(25:115)] = 99999 return {red: 255; green: 255; blue: 255} case 18 > $c[ansi("2018-05-30"), Lat(-35:30), Long(25:115)] return {red: 0; green: 0; blue: 255} case 23 > $c[ansi("2018-05-30"), Lat(-35:30), Long(25:115)] return {red: 255; green: 255; blue: 0} case 30 > $c[ansi("2018-05-30"), Lat(-35:30), Long(25:115)] return {red: 255; green: 140; blue: 0} default return {red: 255; green: 0; blue: 0} , "image/png")

Page 145: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 145

Response:

Dataset name: DATASET NAME

Pilot: Pilot 1.3.1.B1.1: Cereals and biomass crop

Component: C05.02 FIWARE IoT Hub

API/Operation: CRUD Operations under RESTful API

Example: Device Registration:

Page 146: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 146

End-point URL: http://{$host_url:$host_port}/iot/devices Payload example in JSON format:

{"devices": [ {"device_id": "raspberryPI1", "entity_name": "Field1", "entity_type": "Field", "protocol": "MQTT", "timezone": "Europe/Madrid", "attributes": [ { "name": "leaf_condensation", "type": "double", "metadatas": [ { "name": "units", "type": "string" } ] }, { "name": "temperature", "type": "double", "metadatas": [ { "name": "units", "type": "string" } ] }, { "name": "humidity", "type": "double", "metadatas": [ { "name": "units", "type": "string" } ] }, { "name": "soil_humidity", "type": "double", "metadatas": [ { "name": "units", "type": "string" } ] }, { "name":"Device", "type":"string" } ],

Page 147: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 147

"commands": [ { "name": "ping", "type": "command" } ] } ] }

Data Handling Management: End-point URL: http://{$host_url:$host_port}/v1/admin/config Payload example in JSON format:

{ "service":"DataBio", "servicePath":"/Tragsa", "host":"http://localhost:8080", "in":[ { "id":"Field1", "type":"Field", "providers":[ "http://localhost:8081" ], "attributes":[ { "name":"leaf_condensation", "type":"double" }, { "name":"temperature", "type":"double" }, { "name":"humidity", "type":"double" }, { "name":"soil_humidity", "type":"double" }, { "name":"Device", "type":"string" } ] } ], "out":[ { "id":"DataBioEvent1", "type":"DataBioEvent", "attributes":[ { "name":"leaf_condensation", "type":"double" }, { "name":"temperature", "type":"double" }, { "name":"humidity", "type":"double" }, { "name":"soil_humidity", "type":"double" }, { "name":"Device", "type":"string" } ], "brokers": [ { "url":"http://localhost:1026", "serviceName": "DataBio", "servicePath": "/Tragsa" } ] } ], "statements":[

Page 148: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 148

"INSERT INTO DataBioEvent SELECT leaf_condensation, temperature, humidity, soil_humidity, Device FROM Field Where leaf_condensation < 90 AND temperature > 15 AND 20 < humidity < 90 AND 0 < soil_humidity > 50" ] }

Page 149: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 149

Concluding remarks The DataBio project is an EU lighthouse project with twenty-six pilots running over a hundred

of piloting sites across Europe in the three main bioeconomy sectors, agriculture, forestry,

and fishery. These sectors utilize, process and produce many datasets and streams that

creates value for both businesses and governments. This deliverable provides an overview of

datasets in the context of DataBio platform and pilots allowing the reader to gain insight into

why the data is needed, what the data provides and how it can be retrieved.

The requirements from the pilots and platform identifies datasets that are needed for the

pilot applications. The ArchiMate models provides trace links to the relevant components,

requirements and application goals through, allowing users to carry out coverage and orphan

analysis as well as traditional trace navigation.

The overview of datasets shows that DataBio pilots currently utilize 14 existing datasets,

improve 6 datasets by processing or enriching with other datapoints, and finally are creating

a total of 23 datasets. Each dataset is described with metadata in the DataBioHub. The

numbers are expected to grow during the project’s lifetime.

The first phase of the DataBio project has focused on the usage and creation of datasets based

on the needs and requirements of the DataBio pilots. The next phase will continue with this,

but will also have an increased focus on interoperability aspects of datasets through the use

of ontologies and potential standard data models and access mechanisms/services and APIs.

Further there will be an increased focus on secure data sharing and data exchange beyond

the individual pilots to support a growing data economy in the DataBio areas of agriculture,

forestry and fishery.

Page 150: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 150

References Reference Name of document (include authors, version, date etc. where applicable)

[REF-01] European Commission, 2018: https://eur-lex.europa.eu/legal-

content/EN/ALL/?uri=COM:2018:0232:FIN

[REF-02] European Open Data Portal: https://data.europa.eu/euodp/data/

[REF-03] European Commission, January 2017: (https://ec.europa.eu/digital-single-

market/en/policies/building-european-data-economy

[REF-04] Transforming Transport (web): https://data.transformingtransport.eu/

[REF-05] Dunning, A.(2017). ‘Are FAIR data principles FAIR?’ LIBER Webinar.

http://www.ijdc.net/article/view/567. Retrieved 2018-08-21

[REF-06] Press, G. (2016). ‘Cleaning Big Data: most time-consuming, least enjoyable data

science task, survey says’, Forbes [Internet].

https://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-

time-consuming-least-enjoyable-data-science-task-survey-

says/#3cfa77426f63. Retrieved 2018-08-21.

[REF-07] Moons, B. et al. (2016). Realising the European Open Science Cloud.

https://ec.europa.eu/research/openscience/pdf/realising_the_european_ope

n_science_cloud_2016.pdf. Retrieved 2018-08-21

[REF-08] Wilkinson, M. D. et al. (2016). The FAIR Guiding Principles for scientific data

management and stewardship. Nature Scientific Data, 3, 2016.

doi:10.1038/sdata.2016.18.

[REF-09] FORCE 11 (2014) https://www.force11.org/fairprinciples, Retrieved 2018-08-

21.

[REF-10] European Commission (2016): H2020 Programme Guidelines on FAIR Data

Management in Horizon 2020.

http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/

oa_pilot/h2020-hi-oa-data-mgt_en.pdf. Retrieved 2018-08-21.

[REF-11] DataBioHub: https://www.databiohub.eu

[REF-12] https://www.earthobservations.org/geoss.php

[REF-13] https://inspire.ec.europa.eu/sites/default/files/geodcat-ap.pdf

[REF-14] http://micka.bnhelp.cz/

[REF-15] https://ckan.org/

[REF-16] 5-star scheme, Tim Berners Lee: https://5stardata.info/de/

Page 151: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 151

[REF-17] Go-Fair Initiative (https://www.go-fair.org/)

[REF-18] Dublin Core MetaData Initiative http://dublincore.org/

[REF-19] Creative Commons: https://creativecommons.org/ns

[REF-20] DataBio deliverable D6.2 “Data Management Plan”, June 30, 2017

[REF-21] Common license types for datasets (https://help.data.world/hc/en-

us/articles/115006114287-Common-license-types-for-datasets, retrieved

2019-08-21).

[REF-22] DataBio deliberable D5.i2 “EO data sets, formats and sets”, https://rid-

redmine.intrasoft-intl.com/projects/databio/dmsf?folder_id=1685

Page 152: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 152

Appendix A Metadata template table

Field Value

Internal Name of the

Dataset

Name of the Dataset/API

Provider

Short Description

Extended Description

Version

Initial Availability Date

Data Type

Personal Data

Rightsholder

Other Rights Information

Dataset/API

Owner/Responsible

Dataset/API

Owner/Responsible

Contacts

Page 153: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 153

Technology

Name of the System

Dataset Data Model/API

Interface

Data Model: Standards,

Glossaries and metadata

standards

Data Identifier - Standard

used

Data Model - Specific Data

Model

Data Volume

Update Frequency

Data Archiving and

preservation

Geographical Coverage

Timespan

Access Level

Access Mechanism

Page 154: D4.3 Data sets, formats and models (Public version) · ontologies and potential standard data models and access mechanisms/services and APIs. There will be an increased focus on secure

D4.3 – Data sets, formats and models (Public version) H2020 Contract No. 732064 Final – v1.0-Public, 12/12/2018

Dissemination level: PU -Public Page 154

URI