57
A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification 1 Title: A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification Author(s)/Organisation(s): Daniel Fitzner (FHG), Moses Gone (FHG) Working Group: Architecture Team / WP5 References: A5.2-D3 [3.2] Mediator Service Component Specification A5.2-D3 [3.4] Context Service Specification A5.2-D3 [3.6] Processing Components General Model and Implementations A5.2-D3 [3.7] Information Grounding Service Component Specification A5.3-D3 Humboldt Commons Specification / Framework Common Data Model V3 Quality Assurance: Review WP Leader: Thorsten Reitz (FhG) Review dependent WP leaders: Review Executive Board: Review others: Zaheer Khan (UWE), Ulrich Schäffler (TUM), Marian de Vries (TUD) Delivery Date: 30.11.2009 Short Description: This document gives the specification of the Workflow Design and Construction Service (WDCS) Component developed as part of the HUMBOLDT software framework. The WDCS enables users to specify chains of geoprocessing services / functionality. This specification follows the RM-ODP (ISO 10476), and is aimed at providing information on the responsibilities and collaborations of the Workflow Service with other service components described in the main Humboldt Specification document (see A5.2.-D3 [3.0] and [3.1]). For an overview of the entire framework, please refer to the main specification documents A5.2-D3 [3.0] and [3.1]. Keywords: Geospatial Workflow, Transformer, Web Processing Service (WPS), Geoprocessing, Harmonisation, Transformation, Web Service

A5.2-D3 [3.5] Workflow Design and Construction Service

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

1

Title:

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

Author(s)/Organisation(s):

Daniel Fitzner (FHG), Moses Gone (FHG)

Working Group:

Architecture Team / WP5

References:

A5.2-D3 [3.2] Mediator Service Component Specification

A5.2-D3 [3.4] Context Service Specification

A5.2-D3 [3.6] Processing Components General Model and Implementations

A5.2-D3 [3.7] Information Grounding Service Component Specification

A5.3-D3 Humboldt Commons Specification / Framework Common Data Model V3

Quality Assurance:

Review WP Leader: Thorsten Reitz (FhG)

Review dependent WP leaders:

Review Executive Board:

Review others: Zaheer Khan (UWE), Ulrich Schäffler (TUM), Marian de Vries (TUD)

Delivery Date: 30.11.2009

Short Description:

This document gives the specification of the Workflow Design and Construction Service (WDCS) Component developed as part of the HUMBOLDT software framework. The WDCS enables users to specify chains of geoprocessing services / functionality. This specification follows the RM-ODP (ISO 10476), and is aimed at providing information on the responsibilities and collaborations of the Workflow Service with other service components described in the main Humboldt Specification document (see A5.2.-D3 [3.0] and [3.1]). For an overview of the entire framework, please refer to the main specification documents A5.2-D3 [3.0] and [3.1].

Keywords:

Geospatial Workflow, Transformer, Web Processing Service (WPS), Geoprocessing, Harmonisation, Transformation, Web Service

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

2

History:

Version Author(s) Status Comment

001 Daniel Fitzner new Newly created version partially based on V1.0 specification.

002 Daniel Fitzner Reworked according to review from Marian de Vries and Thorsten Reitz.

003 Daniel Fitzner WSDL added

004 Daniel Fitzner Reworked according to review by Ulrrich Schäffler and Zaheer Khan

005 Daniel Fitzner final Finalised

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

3

Table of contents

1 Introduction .............................................................................................................................6

1.1 Abbreviations and Definitions used in this document ...............................................................6 1.2 Standards used in this document..............................................................................................7 1.2.1 OGC Web Processing Service (OGC 05-008r4).......................................................................8 1.2.2 OGC Geography Markup Language (GML)..............................................................................8

2 Enterprise Viewpoint ..............................................................................................................9

2.1 Business Process Overview......................................................................................................9 2.2 A Simple Example.................................................................................................................. 11 2.3 Actors in this component........................................................................................................ 12 2.4 WDCS Use Cases.................................................................................................................. 13 2.5 Scenario Integration ............................................................................................................... 15 2.5.1 Definition of the application specific processing chain............................................................16 2.5.2 Workflow Construction and Execution ....................................................................................17

3 Computational Viewpoint .................................................................................................... 19

3.1 Transformers.......................................................................................................................... 19 3.1.1 Execution-relevant Metadata ..................................................................................................19 3.1.2 Composition-relevant Metadata ..............................................................................................20 3.1.2.1 Metadata on the input / output parameters: ............................................................................20 3.1.3 Metadata of Harmonisation Processing Components ............................................................22 3.2 Workflows............................................................................................................................... 25 3.2.1 Basic Workflow Design ...........................................................................................................25 3.2.1.1 Pre- / Postcondition Matching .................................................................................................26 3.2.1.2 Constraint propagation............................................................................................................27 3.2.2 Example: Basic Workflow Design ...........................................................................................28 3.2.3 Automated Creation of Executable Workflows .......................................................................29 3.2.3.1 Enriching the Basic Workflow with request specific constraints: ............................................29 3.2.3.2 Discovery of data services ......................................................................................................30 3.2.3.3 Automated Harmonisation.......................................................................................................30 3.2.4 Example: Executable Workflow Creation................................................................................31 3.3 Interactions of the WDCS with other framework components ............................................... 33 3.3.1 Interactions within UC 01 ........................................................................................................33 3.3.2 Interactions within UC 06 ........................................................................................................35 3.3.3 Workflow Generator (WG) Module..........................................................................................35 3.3.4 The Repository Manager (RM) Module...................................................................................36

4 Information Viewpoint ......................................................................................................... 38

4.1 The Workflow Interface .......................................................................................................... 38 4.2 Transformer interfaces ........................................................................................................... 39 4.2.1 Transformer.............................................................................................................................39 4.2.2 TransformerDescriptionDTO...................................................................................................40 4.2.3 HarmonisationTransformerDescriptionDTO............................................................................40 4.2.4 TransformerDescription...........................................................................................................41

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

4

4.3 Pre-/Postconditions interfaces ............................................................................................... 41 4.3.1 Data Structure for Preconditions.............................................................................................42 4.3.2 Data Structure for Postconditions ...........................................................................................43 4.4 Data Structure for Workflow Exchange.................................................................................. 44

5 Summary and Outlook......................................................................................................... 46

6 References ............................................................................................................................ 47

7 Annex .................................................................................................................................... 48

Annex A – Use Case Descriptions ........................................................................................................ 48 UC WS01 – Create Basic Workflow .......................................................................................................48 UC WS02 – Edit Basic Workflow............................................................................................................49 UC WS03 – Register Transformer .........................................................................................................50 UC WS04 – Register Harmonisation Transformer .................................................................................51 UC WS05 – Manage Transformer..........................................................................................................52 UC WS06 – Execute Workflow..............................................................................................................53 Annex B – WDCS WSDL....................................................................................................................... 54 Annex C – Figures................................................................................................................................. 56

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

5

Figures Figure 1: The business processes of the WDCS .................................................................................. 10

Figure 2: Example: A geospatial workflow ............................................................................................ 11

Figure 3: The Workflow Service Component Use Cases...................................................................... 13

Figure 4: Methodology for calculating the hiking paths ......................................................................... 16

Figure 5: The input/output signature of the Basic Workflow "Sustainable Hiking Paths"...................... 16

Figure 6: Discovery of input................................................................................................................... 17

Figure 7: Harmonisation requirements .................................................................................................. 18

Figure 8: The final executable workflow (extract).................................................................................. 18

Figure 9: The structure of complex conditions ...................................................................................... 21

Figure 10: The structure of simple Conditions....................................................................................... 21

Figure 11: Example: A Buffer Transformer............................................................................................ 21

Figure 12: The structure of Basic Workflows......................................................................................... 25

Figure 13: Example: Matching Simple Conditions ................................................................................ 27

Figure 14: Example: The Basic Workflow Design process ................................................................... 28

Figure 15: Example: The result of the Basic Workflow Design Process ............................................... 29

Figure 16: Example: A Basic Workflow without harmonization Transformers ...................................... 31

Figure 17: Example: A Workflow ready for execution (Executable Workflow) ...................................... 32

Figure 18: Component Diagram of the Workflow Design and Construction Service ............................ 33

Figure 19: The main interactions within UC 01 ..................................................................................... 34

Figure 20: The main interactions within UC WS01 ............................................................................... 35

Figure 21: The Workflow Interface ........................................................................................................ 38

Figure 22: Transformer Interfaces......................................................................................................... 39

Figure 23: Preconditions........................................................................................................................ 42

Figure 24: Postconditions...................................................................................................................... 44

Figure 25: Large version of Figure 16 ................................................................................................... 56

Figure 26: Large Version of Figure 17................................................................................................... 57

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

6

1 Introduction The HUMBOLDT Workflow Design and Construction Service (WDCS) enables the creation of geospatial workflows that can be used to answer complex geospatial requests that can not be answered by single data services but require further processing.

Within Section 2, the Enterprise Viewpoint of the WDCS (according to the RM-ODP methodology) is described. This section introduces the business process within the HUMBOLDT framework involving the WDCS. The aim of Section 3 (Computational Viewpoint) is to give a detailed description of the approach for workflow design followed within HUMBOLDT. The section additionally contains a description of the interfaces exposed by the WDCS. The specification concludes with the Information Viewpoint in section 4, containing the data structures used internally and externally by the WDCS.

1.1 Abbreviations and Definitions used in this document

Abbrev. Name Definition

WDCS Workflow Design and Construction Service

The WDCS enables the creation of geospatial workflows based on web service technology. The workflow service additionally offers a GUI for managing Workflows and Transformers and for retrieving/storing them.

- Transformer

In HUMBOLDT, processing components that input geodata, transform it and output the transformed geodata are called Transformers. Within HUMBOLDT, such processing functionalities are either encapsulated within OGC Web Processing Services or directly implemented on the platform, the HUMBOLDT Mediator Service is deployed.

Transformers can be distinguished according to the functionality they implement:

• Harmonisation Transformers: Implement some type of transformation related to the HUMBOLDT constraint model such as the transformation between spatial reference systems.

• GIS Transformers: Implement well-known GIS Transformations, such as Buffer or Overlay.

• Application specific Transformers: Implement specific transformations within a certain application area, such as a climate change model.

In the following, GIS Transformers and application specific Transformers are referred to as non-harmonisation Transformers.

IGS Information Grounding Services

The Information Grounding Service component offers the capabilities to discover spatially referenced resources. This component manages Grounding Catalogue/s to find out the

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

7

Abbrev. Name Definition

Grounding Services which are available in HUMBOLDT system. See IGS documentation A5.2-D3 [3.7] Information Grounding Service Component Specification.

MS Mediator Service HUMBOLDT Workflow Execution engine that is responsible for executing the workflows delivered by the WDCS.

Specified in A5.2-D3 [3.2] Mediator Service Component Specification

CS Context Service HUMBOLDT Service for management and delivery of user contexts. Specified in A5.2-D3 [3.4] Context Service Specification.

BW Basic Workflow A basic workflow comprises one or several Transformers and is manually created by a human workflow designer (usually a data custodian).

RM Repository Manager

The Repository Manager holds Basic Workflows (BW) as well as Transformers.

MCR Mediator Complex Request

The MediatorComplexRequest is a data structure used internally within the Framework that acts as a container- and interface-neutral structure containing constraints in the request a client sent to the Mediator node. See Mediator Component documentation Deliverable A5.2-D3

pre Transformer Preconditions

Transformer Preconditions are the HUMBOLDT internal representation of conditions that need to be fulfilled, before the processing component / Transformer (a WPS or java process) can be executed. During workflow design and construction, the preconditions are used to identify, whether Transformers can be connected/are syntactically compatible.

post Transformer Postconditions

Transformer Postconditions are the HUMBOLDT internal representation of conditions that are fulfilled, after the processing component / Transformer has been executed. Similar to preconditions, postconditions are used to identify, whether Transformers can be connected/are syntactically compatible.

1.2 Standards used in this document In this section, standards related to the WDCS are listed and described shortly.

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

8

1.2.1 OGC Web Processing Service (OGC 05-008r4)

The aim of the OGC Web Processing Service Specification [1] is to provide a standardized interface specification for multiple type of geoprocessing operations. The OGC WPS Specification does not fix the specific processes that can be offered (such as Overlay) but specifies an abstract interface, each WPS implementation must offer to clients. This interface consists of the following three operations:

1. GetCapabilities – Allows clients to request service metadata (or capabilities) documents that describe the abilities of the specific WPS implementation.

2. DescribeProcess – Allows clients to request information about all of the processes that are offered. This usually includes information on parameter names, parameter types type as well as a natural language description of the process.

3. Execute – This operation allows a client to run a specified process, using provided input parameter values.

1.2.2 OGC Geography Markup Language (GML)

The OGC GML Specification [2] is a XML encoding for spatial features. It includes both, the spatial and non-spatial properties of features and serves as storage-, as well as a data interchange format between different applications.

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

9

2 Enterprise Viewpoint The aim of this section is to describe the functional purpose of the Workflow Design and Construction Service from a high-level perspective. First, the value of the component is described on an abstract and generic level within the section Business Process Overview, including a simple example of a business process. Afterwards, a number of generic Actors are introduced (section 2.3), which provide the background for the Use Case descriptions in section 2.4. The Enterprise Viewpoint concludes with an application of the use cases to one of the HUMBOLDT Scenarios, namely the Protected Areas Scenario.

2.1 Business Process Overview The main business process offered by the workflow component is shown in Figure 1. Note that the process description is an abstraction of the concrete collaboration between HUMBOLDT components. The end user of geodata does not directly interact with the WDCS but employs the Mediator Service. A detailed description of the WDCS´ collaborations with other HUMBOLDT components is outlined in section 3.3.

The whole process is initiated by an end user in need for geodata. In case, there is already a process description (in HUMBOLDT terminology a Basic Workflow) that is able to provide the requested data, the end user of geodata defines and submits his product definition, including the requested Feature Type that serves as a unique identifier for the Basic Workflow the user wants to execute. This definition includes a list of constraints on the data concerning e.g. spatial reference system or bounding box. Once submitted, the product definition is stored internally. Every time the end user requests geodata, the Product Definition is used to provide an executable process definition, in HUMBOLDT called Executable Workflow. The execution of such workflow finally results in the geodata product requested by the user.

In case, there is no process definition available, the end user contacts the data custodian, a domain expert, and requests such process definition. The data custodians browses all processes, in HUMBOLDT called Transformers, and combines those necessary for achieving the task.

In case, there are processing components missing, the data custodian contacts the data integrator, who is responsible for the management, implementation and registration of processing components to the HUMBOLDT framework. The data integrator implements or discovers and registers the required processing components.

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

10

Figure 1: The business processes of the WDCS

Extracted from the main Business process, the three main capabilities offered by the WDCS can be summarized as follows:

Registration of processing functionality: The registration of processing functionality (encapsulated within WPS or directly implemented on the mediator platform) results in a HUMBOLDT internal representation of that process. This task is carried out by the data integrator.

Basic Workflow Design: The design of Transformers is a manual task and results in a process description called Basic Workflow. A Basic Workflow is a chain of geoprocesses that altogether achieve some geospatial task. A Basic Workflow is abstract in the sense that it does not involve information on concrete data services providing input but only contains constraints on potential inputs data sets.

Provision of executable workflows: In the presence of a user request (i.e. product definition), the workflow service delivers executable process descriptions, called Executable Workflows. They differ from Basic Workflows since they are built automatically in such a way that their output satisfies the

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

11

product definition. In opposite to Basic Workflows, they include all information necessary for execution, such as information on data services that provide suitable input to the workflow.

2.2 A Simple Example Consider the example of a user who wants to receive data on nuclear power plants. Precisely, the user states that he wants to have “the locations of all power plants within a 5 kilometres radius of cities”. We assume that there is no process description that can “answer” this request and therefore, the end user contacts the data custodian (via communication structures outside the HUMBOLDT framework). The data custodian is a GI expert and therefore knows, that the end user request can be answered by a chain of geoprocesses (together with suitable input data). This chain consists of a 5 KM buffering around a city-layer and the subsequent spatial intersection with a power plants layer, as shown in Figure 2.

Figure 2: Example: A geospatial workflow

The data custodian browses all available Transformers in the repository for Intersection and Buffer Transformers. We assume that both are available within the repository (we therefore skip the process of creating new Transformers / registering new processes) and therefore, they can be connected. The data custodian builds a workflow by connecting the output of the Buffer Transformer with one of the inputs of the Intersection Transformer (In HUMBOLDT, this is done using the WDCS GUI or Frontend).

In order to ensure that the chain of Transformer can be used to answer the end user request, the inputs to the chain obviously need to be thematically constrained. This means that the input to the Buffer-Transformer must be a layer representing cities. The second input to the Intersection-Transformer (besides the output of the Buffer Transformer) must be a layer representing power plants. The data custodian specifies such constraints on the inputs which, later on in the presence of a user request, will be used as a discovery query to identify suitable data sources / services. Additionally, the second input to the Buffer-Transformer, the Buffer distance, is manually set to 5 KM. This finishes the task of the data custodian and results in an abstract process description called Basic Workflow. Basic Workflows are abstract in the sense that they do not contain information on concrete data services that can deliver input to the process. This abstract nature of Basic Workflows enables them to be reused in a number of different application contexts, e.g. with spatial data covering different areas or adhering to different spatial reference systems.

The Basic Workflow can now be used to answer the end user request on “the locations of all power plants within a 5 KM radius of cities”, provided suitable input data is bound to the workflow. But additionally, the end user might have further constraints on the data to be returned, such as a specific spatial reference system. The set of all these constraints, together with an identifier of the Feature Type “Power Plants within 5 KM of cities” identifying the Basic Workflow is what we call a Product Definition.

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

12

After submitting the product definition, the constraints are now internally used to enrich the Basic Workflow according to the end user’s needs. Additionally, suitable grounding services, that can provide input to the Basic Workflow (in this case, data services that deliver data on cities and power plants) are discovered and attached to the Basic Workflow, resulting in a workflow that is ready for execution, called Executable Workflow. The workflow execution itself is not part of the WDCS but is performed by the HUMBOLDT Mediator Service Component.

2.3 Actors in this component This table shows the different Actors that interact with the Workflow Design and Construction Service, including the component itself. Actors written in bold letters indicate system components of the HUMBOLDT framework.

Actor Name Actor Description

MS_END_USER For the purpose of this component, both End Users of Geoinformation and End Users of Geodata can be summarized into one actor. This actor does not directly use the Workflow Service (via the Workflow Service GUI) but employs the Mediator Service (via one of the standard OGC interfaces) for executing predefined workflows.

DATA_CUSTODIAN This is the standard data custodian actor from the introduction and specification overview document, chapter 4.2.1. This actor is responsible for managing (i.e. creating, editing, deleting) Basic Workflows via the WDCS´ GUI.

DATA_INTEGRATOR The DATA_INTEGRATOR is considered to be computer science expert or at least to have very good knowledge of information systems and infrastructures. Within the Workflow Component Use Cases, this actor is responsible for the management of Transformers, e.g. the registration of WPS processes to the HUMBOLDT framework.

MS_SYSTEM This Actor represents a deployed and configured instance of the HUMBOLDT Mediator Service Component, i.e. the main system controller. MS_SYSTEM provides WDCS_SYSTEM with user requests and receives executable workflows.

WDCS_SYSTEM This Actor represents a deployed and configured instance of the Workflow Design and Construction Service Component.

IGS_SYSTEM This Actor represents a deployed and configured instance of the Information Grounding Service (IGS), i.e. the HUMBOLDT framework component responsible for management and discovery of geospatial data services.

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

13

2.4 WDCS Use Cases This section introduces the Use Cases of the Workflow Service Component. Figure 3 below shows a UML Use Case Diagram of the main Use Cases of the Workflow Service Component.

Figure 3: The Workflow Service Component Use Cases

WDCS01: Create Basic Workflow: This process results in a Basic Workflow that can be used to solve some geospatial task.

WDCS02: Manage Basic Workflow: This process involves all management issues for Basic Workflows that can occur, such as editing, deleting etc.

WDCS03: Register Transformer: Within this use case, a new processing component is registered to the HUMBOLDT framework.

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

14

WDCS04: Register Harmonisation Transformer: Within this use case, a new harmonisation processing component is registered to the HUMBOLDT framework.

WDCS05: Manage Transformer: This use case involves all management tasks related to Transformers.

WDCS06: Request Executable Workflow: This use case describes the necessary steps that the workflow component takes to deliver an executable workflow in the presence of a concrete request.

A more detailed description of the use cases can be found in Annex A.

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

15

2.5 Scenario Integration This section provides a more sophisticated example of how the WDCS is used within a HUMBOLDT scenario. The complete example, involving all steps necessary and all HUMBOLDT components involved can be found in the document A5.2-D3 [3.1] Specification Introduction & Overview.

Mario has the task of creating a web portal that can be used by hikers for identifying suitable hiking routes. An important goal of the portal project is to keep the portal independent from the data sources of one region in order to enable the reuse in different areas, such as in between Spain and Portugal. Another technical goal is to enable users to employ OGC-conforming clients for retrieving data from the portal. Hence, the portal server must offer standardized OGC-interfaces for data access, such as WFS or WMS.

The aim of this example is to show the use of the WDCS within the use case described above. Within the scenario description, the data integrator is embodied by Luigi, a programmer and IT expert. Luigi is responsible for maintaining the IT-infrastructure of the protected areas management agency. The data custodian is represented by Carla, an expert in geospatial data models and – together with Luigi - responsible for the data sets involved. Finally, the end user of geodata is embodied by Mario Rossi Buhl, a regional officer at the Territorial Planning Department of an Italian region. Mario is the person responsible and with the initial need for the web portal on hiking paths. He is supported by Luigi and Carla in achieving his task.

Before the portal can built, potential data sets involved in the calculation need to be identified. In order to be able to do this, Mario makes the methodology for the calculation of the hiking routes explicit, which is described as follows:

The first and most important condition is that the hiking routes shall lead through the protected areas managed by the protected areas management agency, Mario’s employer. The goal is to make hikers familiar with the preservation of nature and to improve the income of the people living in that area.

Further, the hiking routes shall lead close to stopping places, e.g. places for overnight staying.

Additionally, there are some special protected areas that should not be entered by humans and therefore be avoided by the hiking paths.

Finally, the delivered data on the hiking paths should include some information on the area that is crossed, such as forest or wood.

Based on these requirements, the following minimally required data sets involved can be identified.

Footpaths and Hiking Trails: This data set delivers information on potential hiking paths.

Protected Areas: This data set delivers both, data on protected areas that should be crossed by the hiking paths, as well as data on special protected areas that must be avoided by the paths.

Stopping Places: This data set delivers data on stopping places such as places to stay overnight or panoramic places.

Vegetation: This data set delivers data on the vegetation coverage, such as forest, wood or rocks.

Based on the methodology and the data sets involved, an abstract chain of processing steps can be identified for calculating the paths. This chain is shown in the figure below.

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

16

Figure 4: Methodology for calculating the hiking paths

First, those areas are selected that should not be crossed by the paths (a) and the paths not crossing such areas are identified. Then, a buffer is calculated around stopping places and only those hiking paths are selected, that are close to (at least one) of them (b). Further, the information on vegetation is attached to the hiking paths in (c) and finally, only those hiking paths are selected that cross protected areas (d).

2.5.1 Definition of the application specific processing chain

Involved HUMBOLDT User Groups: End User (Mario), supported by Data Integrator (Luigi)

Involved HUMBOLDT Components: Workflow Design and Construction Service (WDCS), WDCS GUI, Geospatial Processing Components (e.g. WPS) registered to HUMBOLDT (the WDCS)

The abstract chain of processing steps shown in Figure 4 can be transferred to a chain of concrete geoprocessing functionality. This is done by Mario, using the graphical user interface of the Workflow Design and Construction Service (WDCS), called the Workflow Frontend (WF). The WF enables him to connect the processing components registered to the system into a chain whose execution calculates the hiking paths according to the functional methodology for hiking routes calculation explained above. The resulting chain of geoprocessing functionality is called a Basic Workflow in HUMBOLDT terminology. Figure 5 shows the input/output signature of the Basic Workflow “Sustainable Hiking Paths”, abstracting from the concrete chain of processing.

Figure 5: The input/output signature of the Basic Workflow "Sustainable Hiking Paths"

The Basic Workflow Sustainable Hiking Paths is abstract in the sense that it is data independent. This means it does not yet hold information on concrete data services that deliver input. Instead, it contains constraints on the potential input data sets, such as the schema constraints. This means the same methodology can be used both in Italy or in Portugal and Spain, and needs only to be defined once. This becomes especially valuable when thinking of applications of a European scale, with possibly hundreds of different data sources.

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

17

During execution, concrete data download services (also called grounding services) that satisfy the input constraints are automatically discovered and attached to the workflow, resulting in an Executable Workflow.

The names of the layers that serve as input to the workflow such as the name Protected Area are the names of FeatureTypes from the integrated schemas to which all individual data schemas have been mapped previously using HALE. This mapping is necessary since –- although most of the processing components within the workflow are schema-independent and operate on (GML) geometries –- there can be schema specific processing steps encapsulated within the workflow. For example, there is a selection of features based on a non-spatial attribute of the Protected Area schema. This selection is only possible if the data on protected areas that is used as input to the workflow adheres to the integrated schema for protected areas.

Finally, Mario registers the Basic Workflow he created to the HUMBOLDT system as a new extension to the Protected Areas integrated application schema as the Feature Type “Sustainable Hiking Paths”. The Basic Workflow is stored within the WDCS workflow repository and can now be requested as any other Feature Type, using a WFS compliant client such as OpenLayers.

2.5.2 Workflow Construction and Execution

Involved HUMBOLDT User Groups: End User of Geodata / Geoinformation (Mario), Data Custodian (Carla)

Involved HUMBOLDT Components: Mediator Service, Context Service, Workflow Design and Construction Service, Information Grounding Service

After defining the context, Mario requests – using an OGC-conformant client – the data. The component responsible for handling the request is the HUMBOLDT Mediator Service.

Based on the user request and the context, the Basic Workflow “Sustainable Hiking Paths” is retrieved and automatically enriched with request specific parameters, e.g. the specific bounding box and spatial reference system requested by Mario.

After enrichment, the input descriptions of the Basic Workflow are now passed as a discovery query to the Information Grounding Service (IGS). The IGS returns –- for each single input to the Basic Workflow such as “Protected Areas” – a pointer to a number of data services (e.g. WFS) that can deliver data.

But since Mario requested data for an area that crosses the boundary between two countries and that is therefore not served by one single data service, only a spatial combination of the two data sources covers the area requested by the Mario. Figure 6 shows the two data services discovered for Protected Areas.

Figure 6: Discovery of input

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

18

The harmonisation requirements and the solution that is applied are shown for the Protected Areas layer in the following. The two data sources discovered (Protected Areas WFS1 and WFS2) can –- if spatially combined –- deliver input data to the Basic Workflow such that the output covers the area requested by Mario. However, both WFS deliver data within different reference systems since they are maintained by different data providers from different countries. Moreover, the method of data acquisition is different and therefore, the two data sets differ in precision, which is directly visible on the boundary of the two regions, as shown in Figure 7.

Figure 7: Harmonisation requirements

Hence, before both WFS can serve as input to the Basic Workflow, they require harmonisation of spatial reference systems as well as the alignment of their common boundaries, a process known as Edge Matching. Based on the metadata of the discovered services WFS1 and WFS2 and the constraints on the input to the Basic Workflow representing “Protected Areas”, the required harmonisation transformations are automatically identified and attached to the Basic Workflow. Additionally, the data delivered by both WFS is transformed to the integrated target schema for protected areas, based on the predefined schema mapping. The workflow resulting from the automated insertion of harmonisation transformers is shown in Figure 8.

Figure 8: The final executable workflow (extract)

The automated process of adding harmonisation transformations to an application specific processing chain is called workflow construction and is performed for every single input (except the buffer distance, which is delivered by the user) to the Basic Workflow if required.

This automated harmonisation takes place within the WDCS. When finished, an executable workflow description is returned to the Mediator Service. After executing the workflow, the Mediator Service returns the result to Mario.

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

19

3 Computational Viewpoint The aim of this chapter is to explain, how the functionality described in Section 2 is achieved. It does not contain algorithms or implementation details but gives the generic conceptual background for workflows in HUMBOLDT.

3.1 Transformers In HUMBOLDT, we call processing components that input some (geo-) data, carry out a transformation and output (geo-) data Transformers. In HUMBOLDT, we distinguish between harmonisation and non-harmonisation Transformers. The difference is simply that harmonisation processing components perform some task related to the HUMBOLDT constraint model. The HUMBOLDT constraint model as specified in the document A5.3-D3 HUMBOLDT commons specification contains a set of constraint types that can be used to specify constraints on the data sets to be returned. Examples are constraints for spatial reference systems, application schemas or language. If a processing component performs some sort of processing related to a constraint type (e.g. spatial reference system transformation, schema transformation or language transformation), it is called a harmonisation processing component or harmonisation Transformer. All others are non-harmonisation processing components. Further, harmonisation and non-harmonisation processing components are handled differently in the framework. Both can be used by users of the WDCS GUI for manually composing workflows. However, only harmonisation processing components are subject to automated composition. Hence, the process metadata required for harmonisation and non-harmonisation processing components are different.

3.1.1 Execution-relevant Metadata

Execution relevant metadata is equal for harmonisation and non-harmonisation processing components / Transformers and is passed from the WDCS to the HUMBOLDT Mediator Service. What is passed from the WDCS to the MS is a set of Transformer descriptions, each containing the following information.

How to access the processing components that are part of a workflow.

Within HUMBOLDT, processing components / transformers can be implemented in two different ways. Either the processes are directly implemented in java, or they are encapsulated in a WPS. The information necessary for enabling the HUMBOLDT Mediator Service to execute them differs:

1. Java Transformer: In case, a Transformer is implemented in java, it is accessible to the MS. The only information that needs to be passed is the name of the java class that holds the transformation.

2. WPS Transformer: In case, the process is offered via WPS, the WPS URL and the ProcessIdentifier (as it appears in the GetCapabilities response of the WPS) suffices.

… how to instantiate the formal (named) parameters.

This information is essentially a mapping of named input parameters to something that can be used by the mediator to instantiate the parameters. The range of the mapping can have three different forms. The mapping maps a named input parameter either to…:

1. … a value, e.g. 5 or “input string”

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

20

2. … a named output of another processing components or (since we assume processing components have single outputs) the name / identifier of another processing component.

3. … a pointer to a data source, e.g. a WFS request.

Note that, since the input of a Transformer can point to another Transformer, the set of Transformer descriptions as passed from the WDCS to the MS forms a workflow / chain of Transformers. This is what is called an Executable Workflow in HUMBOLDT. A detailed description of the HUMBOLDT model for workflows can be found in section 3.2. Further, the exchange model for Executable Workflows can be found in the HUBMOLDT Commons specification.

3.1.2 Composition-relevant Metadata

Composition-relevant metadata is only hold internally by the WDCS and only used for composition. Every input of a transformer is represented in the WDCS by a so-called precondition, the output by a so-called postcondition. We assume that Transformers can have multiple inputs / preconditions and a single output / postcondition. The purpose of such conditions is to enable the WDCS to automatically check whether two Transformers can be connected, e.g. whether they operate on equal datatypes.

Preconditions represent data input to the Transformer. They e.g. constrain a certain input on type level (e.g. integer) but also allow additionally constraints on the acceptable values, such as spatial reference systems. In HUMBOLDT, a Transformer has several preconditions that each represent a single data input to the process behind. Preconditions can either be simple or complex conditions.

Pre- and postconditions enable automated support during workflow design. This means, when a user connects two Transformers, the system automatically checks the compatibility of these two Transformers based on the pre- and postcondition specifications. The automated compatibility checking involves data types as well as other constraints on the input / output parameters.

3.1.2.1 Metadata on the input / output parameters:

In case a user connects two Transformers, the system first checks for type compatibility. This means, whether the output type of the source Transformer is a subtype of the type of the input of the target Transformer. Since, for WPS, this basically amounts to (GML-) schema or XSD matching and since we allow users to combine java Transformers and WPS in an unrestricted way, there is a HUMBOLDT internal type system that allows for subtype / supertype checking without considering the different encoding for the data to be passed between the Transformers. For example, if the system (WDCS) allows a java Transformer to provide input to a WPS that accepts GML, then in principle the meaning is: “The serialised (to GML) java object that is the output of the java Transformer validates against the GML schema required by the WPS”.

Complex Conditions:

Complex conditions represent complex data, e.g. a polygon layer encoded in GML. Complex Conditions comprise a set of constraints, such as a type constraint (e.g. a certain schema) and a list of required SRS. All constraints refer to the HUMBOLDT constraint model as specified in the document A5.3-D3 HUMBOLDT Commons Specification / Framework Common Data Model V3. For example the SRS constraint holds the epsg-codes of all reference systems allowed. An empty constraint within a complex condition means that the complex condition does not constrain/restrict the data element concerning this particular constraint type. For example an empty SRS constraint means that every spatial reference system is allowed.

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

21

Figure 9: The structure of complex conditions

Simple Conditions:

Simple Conditions represent simple data such as strings or integers. Simple Conditions can comprise a unit of measure constraint (UoM) and a list of allowed values.

Simple ConditionType Constraint: Units of Measure:List of allowed values:

Figure 10: The structure of simple Conditions

Shared Constraints

Pre- and Postcondition specifications are used to constrain possible input and output values. But in most cases, those values are not fully separated but interrelated. This includes e.g. cross-parameter constraints, such as a common reference system of all input layers of a specific process. Another example is the relation from input to output. Consider e.g. a geoprocess that accepts input data in multiple (GML-) schemata X,Y and Z and produces an output within the same schema as the input. The common way to express this is that both the pre- and postcondition of that process would contain a list of allowed schema, namely X, Y and Z, without explicitly stating the relationship, i.e. that the output schema depends on the input schema or precisely, that if the service inputs data in schema X, it outputs data in schema X, if it inputs data in schema Y, it outputs data in schema Y etc.

These interdependencies between different input parameters as well as input/output parameters are reflected within HUMBOLDT by so-called shared constraints or conditions.

Example

Figure 11 shows a graphical representation of a Buffer Transformer encapsulated within a WPS. It creates a Buffer around the features of the input layer (with the specified distance). Consequently, this process is represented as a HUMBOLDT Transformer with two preconditions and a single postcondition.

Figure 11: Example: A Buffer Transformer

The input layer is translated to a Complex-, the Buffer Distance (an integer) to a Simple Precondition within HUMBOLDT. Some constraints (indicated by red letters) are (since the process is implemented as a WPS Process) automatically derived e.g. out of the response to the DescribeProcess-request for

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

22

the Buffer process. The other ones are manually added. The Transformer additionally holds all information needed for executing the Buffer-process, i.e. the URL of the WPS that encapsulates the process as well as the process identifier as appearing in the response to the GetCapabilities-request.

The SRS Constraint within the Complex Precondition that represents the input layer contains a constraint on spatial reference systems. Hence, only layers with geometries specified in epsg:4326 (WGS84) or epsg:31467 (Gauß-Krueger) can serve as input to this process. The SRS Constraint within the Complex Postcondition expresses that the process outputs layers with geometries specified in epsg:4326 or epsg:31467. However, the exact meaning of these constraints is: “whenever the process inputs epsg:4326 data, the output will be epsg:4326 data” and “whenever the process inputs epsg:31467 data, the output will be epsg:31467”. This means, the service does not perform any spatial reference system transformation on the input. Expressing this relationship from input to output is one of the purposes (besides expressing cross-parameter constraints) of shared constraints (indicated by blue letters in Figure 11). Since WPS process descriptions do not allow such kinds of relationships, this information must be provided manually when registering a process.

3.1.3 Metadata of Harmonisation Processing Components

The need for automated harmonisation can occur either at the beginning (harmonizing data services to workflow preconditions) or at the end (harmonizing the workflow output to a user request) of a workflow. The automated harmonisation is based on the automated insertion of harmonisation Transformers into a Basic Workflow. Similar to non-harmonisation Transformers, they are either encapsulated within a WPS or directly implemented in java and accessible to the MS.

For the harmonisation Transformers, the knowledge or metadata to be captured is the same as for the non-harmonisation Transformers. However, for automated execution / dynamic binding, this knowledge is not sufficient. Therefore, the following additional knowledge must be provided, i.e.

1. What harmonisation issue does this processing component refer to / solve?

There are several ways to make this knowledge accessible to the system.

First, when registering a harmonisation processing component, the user / service provider just tags the component with a harmonisation category known to the framework, for example “schema transformation”. This enables the WDCS to recognize the meaning of the process. However, the system still needs to know, when to apply this component, this means, there must be some knowledge on when to apply processing components of type “schema transformation”. Within the HUMBOLDT framework, this means, the system must know which constraint violation of the HUMBOLDT constraint model requires the application of schema transformation. This constraint violation can often not just be solved by transforming a single data source (as e.g. for schema constraint violation), but often only by combining / transforming multiple data sources.

Second, when registering a harmonisation processing component the knowledge on when to apply it is provided. This means, the service provider specifies under which circumstances (i.e. constraint and data source combinations) the processing component must be applied.

How this is handled in HUMBOLDT:

In HUMBOLDT, we have a fixed set of harmonisation categories. While new transformers / processing components that transform the data with respect to these categories can be added to the framework without programming, extending the harmonisation categories themselves without programming is currently not possible. This is due to the fact that, in order to implement the execution logic (i.e. when to apply a specific harmonisation component etc.) as part of the framework, there must be knowledge on harmonisation categories already when implementing the framework. However, when registering a

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

23

harmonisation processing component to the framework, it must only be “tagged” with a harmonisation category in order to enable the WDCS to recognize its meaning.

2. How does the WDCS recognise the meaning of the inputs?

Since the harmonisation processing components are subject to automated insertion by the WDCS, the WDCS must have knowledge not only what harmonisation category a specific processing component can perform (1.) but additionally, what the different parameters of a Transformer refer to, i.e. the semantics of the inputs. Hence, they need to be semantically annotated.

How this is handled in HUMBOLDT:

The semantic annotation of the inputs is stored as a mapping of the parameter names of the individual process to parameter names of parameters attached to the harmonisation category.

Example:

The harmonisation category SpatialReferenceSystemHarmonisation has the following generic method signature attached to it:

Harmonisation Category: SpatialReferenceSystemHarmonisation Parameters: SourceSRS, TargetSRS, LayerToBeTransformed

Assume, there is a spatial reference system Transformer with the following signature:

Signature(target_crs, source_crs, layer)

In this case, the mapping or semantic annotation that is stored is the following:

target_crs TargetSRS

source_crs SourceSRS

layer LayerToBeTransformed

A processing component is (according the HUMBOLDT characterisation) a valid harmonisation processing component if it is tagged with a harmonisation category known to the framework and if it provides for each of its mandatory inputs, a mapping to parameters of the generic inputs.

3. Not every harmonisation Transformer can solve every harmonisation problem concerning that category!

Assume, the user requests data in a certain spatial reference system Y. The HUMBOLDT catalogue (IGS) discovers a data source in spatial reference system X. Thus, the WDCS must automatically identify processing components in its internal registry tagged with “spatial reference system transformation”. However, not every component identified transfers data from X to Y. Hence, the WDCS must perform some sort of matchmaking between data source, user request and processing component.

How this is handled in HUMBOLDT:

As described above, for each transformation category, we have a generic signature. These signatures usually have parameters that are not directly used for processing but control the algorithm that is executed. For example, the generic signature for spatial reference system harmonisation looks like this:

Harmonisation Category: SpatialReferenceSystemHarmonisation Parameters: SourceSRS, TargetSRS, LayerToBeTransformed

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

24

The parameters that are not directly used for processing but control the algorithm that is used are in this case: SourceSRS and TargetSRS. These parameters control, which algorithm is executed inside the processing component. In case, the parameters of a specific service are annotated with these generic parameters, then we assume that the user, when registering a service, provides a list of allowed values for these algorithm control parameters. These allowed values then give the information to which concrete harmonisation problems a specific service can be applied. Assume, a service transforming only from EPSG:31467 to EPSG:4326 is registered to the framework. Then we assume the following mapping as part of the registration process:

target_crs TargetSRS

source_crs SourceSRS

layer LayerToBeTransformed

Further, for each algorithm-control parameter, a set of allowed values is specified, which, in the example, looks like this:

AllowedValues(source_crs) = { “EPSG:31467” }

AllowedValues(target_crs) = { “EPSG:4326” }

It must be noted that these lists of allowed values must be based on a common and agreed-upon vocabulary, e.g. in the case of spatial reference system transformation, the EPSG-codes.

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

25

3.2 Workflows The whole process of workflow design and construction outlined in this section is performed without any instance data and solely relies on the metadata on processing components as described in the previous section. The aim of this section is to give more details on HUMBOLDT workflows and how the metadata described in the previous section is handled within the framework.

3.2.1 Basic Workflow Design

A Basic Workflow (BW) is an abstract description of a chain of geoprocesses that altogether can achieve a certain task that can not be achieved by single data- or processing services. A Basic Workflow does not contain any information on data services and is the result of the manual workflow design process. A Basic Workflow consists of Transformers, connected into a chain. A connection between two Transformers means that data is passed between the processes/WPS (represented by the Transformers) during execution. A Transformer whose output serves as input to another Transformer during execution is called a source Transformer. A Transformer which is connected to a source Transformer is called a target Transformer.

Basic Workflows are only allowed to have a single output. This means that they must synchronize into a single target Transformer in order to be valid. Therefore, the output of a Basic Workflow after execution will be the output of the last Transformer in the chain. For this reason and since Transformers can only have a single output, HUMBOLDT workflows have a tree-like structure with the Transformer being the individual nodes and the last Transformer in the chain being the root node (Figure 12). Note that the arrows indicate the data flow during execution while the information on the connections is actually stored in the different direction (i.e. a Transformer input stores a pointer to another Transformer whose output will provide the value at runtime).

Speaking in terms of the well-known workflow patterns presented in [3], HUMBOLDT workflows allow the sequence (the linear chaining of processing components, i.e. the basis of all workflow systems) and synchronisation (multiple subprocesses that converge into a single process) control flow patterns. The analysis of harmonisation processes within the HUMBOLDT Work package 7 has shown that these patterns are sufficient for the scenarios within HUMBOLDT. However, in future versions this model might be extended to additionally allow patterns such as conditional branching etc. Extending the WDCS workflow model would be possible without the need for changing the overall workflow architecture / model.

Figure 12: The structure of Basic Workflows

Further, Basic Workflows itself have preconditions. The preconditions derive from the set of preconditions of all Transformers within the chain that are unsatisfied. A precondition is satisfied

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

26

(indicated by red colour in Figure 12) if either a pointer to / from another transformer in the chain is set or (in case of literal inputs) if a value has been specified that should be used for instantiation during execution (e.g. a buffer distance). The unsatisfied preconditions within a BW (also called the leaf preconditions) are the entry points to a BW in the sense that during execution, for each input represented by an unsatisfied precondition, data is needed. In case of unsatisfied Complex Preconditions, this data is derived from an external data source such as a GML-layer from a WFS. In case of unsatisfied Simple Preconditions, this data (a simple value such as string or integer) must be manually set by the workflow designer. Therefore, unsatisfied simple preconditions can be satisfied by specifying a value, e.g. a buffer distance.

3.2.1.1 Pre- / Postcondition Matching

Transformer can only be connected when they are syntactically compatible. This compatibility involves data types (e.g. a Transformer delivering string can not deliver input to a Transformer accepting integers) as well as further constraints on the data values (e.g. a unit of measure). The WDCS helps the user in the workflow design process by automatically checking compatibility between individual Transformers, based on their pre- and postcondition specifications.1 Whenever two Transformers are connected, the postcondition of the first Transformer is matched/compared with the precondition for which it serves as input. This process is called Pre-/Postcondition Matching and results in an approval or rejection of the connection.

Matching Simple Conditions:

The need for matching Simple Conditions occurs, when a Transformer that delivers a simple value as output should serve as input to another Transformer that accepts a simple value. The matching of Simple Conditions boils down to the matching/comparison of the individual constraints within a simple condition. Therefore, it consists of three steps:

1. Comparing the types based on the following rule: A Simple Postcondition A type matches a Simple Precondition B if the type specified within A is a subtype of the type specified within B (E.g. a short-integer can serve as input to a service accepting long integer-values but not vice-versa).2

2. Comparing the Units of Measure: A Simple Postcondition A matches a Simple Precondition B if the UoM are equal (E.g. a process delivering kilometres does not match a process that inputs meters).

3. Comparing the set of allowed values: A Simple Postcondition A matches a Simple Precondition B if the set intersection of the sets of allowed values of A and B is not empty. Note that if either one or both of the inputs sets (the sets of allowed values of A and B) are empty, pre / post still match. 3

Example:

1 Note that this compatibility checks only involve the constraints expressed within pre/post. Deciding on whether the connection of Transformers is useful from a semantic/pragmatic point of view is still the responsibility of the workflow designer. 2 Note that there can be geoprocesses that operate on multiple types (i.e. polymorphic operations). Matching polymorphic operations is currently outside the scope of this specification. 3 This is the case because the meaning of empty constraints is that “everything is allowed”. For example, the meaning of an empty language constraint is that “every language is allowed”.

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

27

Figure 13 shows an example of matching simple conditions. The data described by the simple postcondition should serve as input for the simple precondition. This is possible, but only for a subset of their constraints.

Figure 13: Example: Matching Simple Conditions

Matching Complex Conditions:

Matching between Complex Conditions occurs when the output of a Transformer that delivers a complex value should serve as input to another Transformer that accepts a complex value, such as a GML layer. Similar to the matching of simple conditions, the matching of complex conditions boils down to the matching of each constraint within both complex conditions.

Again, this matching not just delivers a simple yes/no (Boolean) answer but results (in case of approval) in a new Complex Constraint holding all constraints for which the connection is valid. Hence, the new Complex Constraint can be considered as a set of constraints that restrict the data elements passed between the two Transformers. In the most simple case where a constraint is just a set of allowed values (e.g. a set of allowed reference systems), the new constraint contains the values common to the sets of both input conditions (the set intersection). However, there are also more advanced constraints, where the comparison is not that easy (e.g. Bounding Boxes).

3.2.1.2 Constraint propagation

Storing the result of a pre-/postcondition matching (i.e. constraints on data elements for which a connection between Transformers is valid) is only useful, if this information is somehow used in the subsequent process. It must be ensured that only data elements pass the connection that satisfy the constraints imposed on that connection. Since we do not foresee to store these connection constraints and check them at runtime in the Mediator Service, the only possibility to make use of the constraints on the connections is to propagate them throughout the workflow in the design phase. This process is called constraint propagation and the overall goal is to push the constraints to the leaf preconditions of the BW to ensure, that only data elements are used as input to the BW that satisfy all constraints imposed on all connections between Transformers within that BW. The main purpose here is to reduce the risk of runtime (and – to a certain extent – semantic) errors within the workflow during execution. If the constraints are not propagated to the leaf preconditions of the workflow, there is the possibility that data is used as input to the workflow that satisfies its leaf preconditions (i.e. the first Transformers within the chain execute successfully) but that does not satisfy the constraints imposed on some connections or Transformers within that workflow. This obviously results in a (runtime) error somewhere within a workflow during execution.

The process of constraint propagation is made possible by the use of the shared constraints introduced above and becomes clearer in the following example.

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

28

3.2.2 Example: Basic Workflow Design

Figure 14 shows the workflow design process for the request on “all power plants within 5 KM of cities”. Constraints within the Conditions (pre- or post) that have the same colour indicate shared constraints. For example, the complex pre- and postcondition of the Buffer-Transformer share the constraint on the spatial reference system.

Figure 14: Example: The Basic Workflow Design process

As can be seen, the workflow designer connects the output of the Buffer Transformer to the input of the intersection Transformer. The system automatically checks the compatibility of both Transformers, based on the pre- and postcondition specifications. In the example, the pre-/postconditions only contain a type constraint (e.g. the GML schema) and a constraint on the spatial reference system. Hence, the pre/-post matching includes a comparison of types (in the most simple case, this is just a comparison of URI`s. A more advanced strategy employs subtype/supertype relationships) and a matching of allowed reference systems. Both Transformers can only “work” together if they operate on equal types (or at least types in a subtype relationship) and if the data passed between them adheres to the epsg:4326 reference system. We assume that the matching is successful and hence results in a new Complex Constraint that is stored with the connection, as shown in Figure 15. Since the Buffer process does not perform any coordinate transformation, the established connection is only valid, if the Buffer Transformer is executed on layers that adhere to epsg:4326. Hence, the connection information needs to be propagated to the input of the Buffer Transformer. This constraint propagation is possible due to the fact that the Buffer pre- and postcondition share the SRS Constraint. The constraint propagation results in the deletion of the epsg:31467 reference system within the SRS Constraint of the complex precondition of the Buffer-Process. Additionally, since Intersection Transformer requires both input layers to adhere to a common reference system, the connection constraint needs to be propagated through the Intersection-Transformer as well. The process of constraint propagation results in the sharing of the SRS constraint throughout the whole workflow (as indicated by the blue colour in Figure 15). This means, the whole workflow only executes successfully on layers whose geometries are specified using epsg:4326.

Further, in order to ensure that the Basic Workflow can “answer” the request on “all power plants within a 5 KM radius of cities”, the workflow designer needs to manually edit the workflow preconditions (the leaf preconditions) with additional constraints. These application specific constraints are indicated by red letters in Figure 15. They are used to facilitate, that only data services are used as input that deliver data on power plants (resp. cities). Additionally, the workflow designer sets the value for the Buffer distance to five.

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

29

Figure 15: Example: The result of the Basic Workflow Design Process

The workflow is a valid BW since it only has a single output Transformer (Intersection) and each Transformer is connected to another Transformer within the chain.

3.2.3 Automated Creation of Executable Workflows

An executable workflow is automatically built out of a Basic Workflow in the presence of a concrete user request (i.e. product definition). An executable workflow incorporates all information necessary for the execution, such as the information on data services that deliver input. The difference between a Basic Workflow and an executable workflow are therefore the following:

The Basic Workflow is built by a human user. The Executable Workflow is built automatically by the system on the basis of a Basic Workflow and end user constraints. The Executable Workflow might add additional harmonisation Transformers to a Basic Workflow. The Executable Workflow holds information on data services that provide suitable input.

The main difficulty in creating executable workflows is to ensure, that after the workflow execution, all end user constraints are satisfied. Since, in HUMBOLDT, the workflow design process (performed by the WDCS) is separated from the execution of the workflow (performed by the Mediator Service Component), it is necessary to validate a workflow according to end user constraints already during workflow design and without any data present. Creating executable workflow out of Basic Workflows mainly consists of two automated steps:

1. Enriching the Basic Workflow with request specific constraints

2. Discovering data services that can provide suitable input to the workflow

3. Optional: Insert harmonisation Transformers

The goal of this process is to create a (executable) workflow out of a Basic Workflow whose output data will satisfy (after execution) all constraints imposed by the end user. Since, within this process no concrete data is present, it relies solely on metadata of data sources (as delivered by the HUMBOLDT IGS) and harmonisation processing components, as described in section 3.1.3.

3.2.3.1 Enriching the Basic Workflow with request specific constraints:

The most simple enrichment strategy simply uses the user constraints to enrich the Basic Workflow preconditions (the leaf preconditions). For example, if the user requests data in a certain reference system, this enrichment strategy facilitates that only data adhering to this specific reference system is used as input to the workflow. At first sight, this seems sufficient, but it leads to the following problem: the user constraints obviously refer to the output of the workflow. When the user constraints are satisfied by the input of the workflow, it is not ensured that (after workflow execution) the output still satisfies the constraints. If all input to the Basic Workflow adheres to epsg:4326, the output can still be specified in another spatial reference system (in case, there is a spatial reference system

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

30

transformation encapsulated within the workflow). Hence, in order to ensure, that the workflow output satisfies the user constraints, a more sophisticated enrichment strategy is needed. This can be described as follows:

The incoming user request is compared with the postcondition of the last/final Transformer within the chain. This matching is the same as the matching of conditions as described in section 3.2.1.1. If this match …

a. …succeeds, it results in a new condition, which is, according to the rule of constraint propagation described above, propagated throughout the workflow. For example, if the user requests a certain spatial reference system and the last transformer in the workflow does not specify a constraint on the spatial reference system and shares it with the inputs, the constraint is propagated to the inputs (and further).

b. …fails, the BW in its current form can not satisfy the user request. In this case, automated harmonisation at the end of the BW is needed. For example, if the last transformer in the workflow delivers data only within a specific schema and the user requests a different one, a schema translation is required.

Note that both cases can be recognized automatically by the WDCS due to shared constraints. If a constraint on the output of the last transformer in the workflow is not shared with one of its inputs, it is independent of the inputs and hence must conform to the user request directly or if not – harmonisation is required. If a constraint on the output is shared with at least one of its inputs / preconditions, it depends on the inputs and hence, if it does not conform to the user request directly, the user constraint can be propagated.

3.2.3.2 Discovery of data services

After enriching the BW with request specific constraints, suitable input data needs to be discovered. For this, each leaf complex precondition of the BW is used as a web service / data source discovery query. Discovery is performed via the Information Grounding Service (IGS). The details on this collaboration between WDCS and IGS can be found in the IGS specification.

The discovery process results in a number of web services for each leaf precondition of the BW. In case, one of the discovered web services is a perfect match (i.e. the data delivered by that service satisfies all constraints imposed by the precondition), the web service / data source is attached to the precondition.

In case, none of the web services are perfect matches, one of them is picked4. Since the grounding service violates some of the constraints, it can not be directly attached to the precondition. Therefore, automated harmonisation is needed.

3.2.3.3 Automated Harmonisation

As described above, harmonisation can either be harmonising data sources to workflow preconditions or harmonising the output of the workflow such that it satisfies the user request. As described above, the WDCS holds a set of harmonisation categories. For each of these categories, the WDCS holds knowledge on when to apply a processing component implementing that harmonisation category. This knowledge basically consists of:

4 Usually this is the one with the least number of violated constraints. But more sophisticated strategies can be imagined that take into consideration the types of violated constraints as well as the harmonisation transformations available within the repository.

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

31

- Which constraint (of the HUMBOLDT constraint model) / data source combinations does this harmonisation category solve?

- Which constraint combinations does this harmonisation category solve?

Since every harmonisation category comes with a generic signature (similar to those specified in the WPS Application Profiles) that every individual processing component implementing that transformation category must reference / be annotated with, the WDCS is able to automatically handle them.

3.2.4 Example: Executable Workflow Creation5

Figure 16 shows the Basic Workflow (abstracting from the processing chain within the BW) that has been enriched with a request specific language constraint and for which discovery has already been performed. The user requested data specified in German. This constraint has been propagated to the leaf preconditions of the Basic Workflow. This was possible since unspecified constraints are treated as shared constraints and obviously, the language constraint was not specified within the BW.

Additionally, the user requested the output layer to have geometry specified using the Universal Transverse Mercator (UTM) reference system. Since the BW only executes successfully on epsg:4325 data, the SRS constraint imposed by the user can not propagated throughout the workflow. Hence, SRS harmonisation at the end of the BW is needed in order to satisfy the request.

Additionally, for one of the both complex preconditon (layers), the discovered data source does not satisfy all constraints imposed by the precondition. The discovered data source delivers layers in english, whereas the precondition requires German (coming from the request via constraint propagation). Therefore, language harmonisation is required between the data source and the precondition.

non-

perfe

ct m

atch

Figure 16: Example: A Basic Workflow without harmonization Transformers

Inserting the required harmonisation Transformers results in the executable workflow shown in Figure 17. All data sources have been attached and all necessary harmonisation Transformers have been automatically inserted.

5 Large versions of Figure 16 and 17 can be found in Annex C

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

32

Figure 17: Example: A Workflow ready for execution (Executable Workflow)

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

33

3.3 Interactions of the WDCS with other framework components As described in Section 2, the Workflow Design and Construction Service is a component that offers two main functionalities. First, it enables data custodians to create and manage Basic Workflows via a graphical user interface. Second, it provides the Mediator Service Component with executable workflow descriptions, whose execution results in data requested by end users. Figure 18 shows the interactions of the WDCS with other components.

Figure 18: Component Diagram of the Workflow Design and Construction Service

This interaction is mainly performed within the Use Cases WDCS 01 (Basic Workflow Design/Creation) and WDCS 06 (Workflow Execution). The aim of the following sections is to describe these interactions on the level of interfaces exposed by the WDCS to other framework components.

3.3.1 Interactions within UC 01

The interaction of the WDCS with other components within UC 01 is shown in Figure 19. The interaction involves the graphical user interface (used by the DATA_CUSTODIAN) and the Repository Manager Module (RM).

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

34

Figure 19: The main interactions within UC 01

1. The DATA_CUSTODIAN explores the Transformers available in the repository via the WDCS´ graphical user interface, called the Workflow Frontend. He passes a keyword, such as “Buffer”, to the repository and receives a list of Transformers whose textual process description contain such keyword. Using the GUI, the DATA_CUSTODIAN connects several Transformers to build a Basic Workflow. The whole process of pre/post matching and constraint propagation described in chapter 3 happens within this process.

2. For finishing the Basic Workflow design process, the DATA-CUSTODIAN calls the RM operation responsible for storing Basic Workflows. The process of BW validation that checks whether the workflow is valid (i.e. only has a single output Transformer etc.) is encapsulated within this method.

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

35

3.3.2 Interactions within UC 06

The main interaction of the WDCS with other framework components takes place, when a user requests data. Figure 20 shows the information flow within the HUMBOLDT framework in the presence of a user request (in this case, an OGC WMS request). Note that the diagram focuses on the interactions of the WDCS and therefore does not consider other components that do not directly interact with the WDCS, such as the Context Service Component.

Figure 20: The main interactions within UC WS01

The following is what happens within the process shown in Figure 20: The Mediator passes a request (formalised as a Mediator Complex Request, see HUMBOLDT Commons Specification) to the Workflow Generator. WG passes the MCR directly to RM for retrieving a Basic Workflow. Within WG, the BW is enriched with request specific constraints and harmonisation Transformers are automatically inserted at the end of the BW as described in section 3.2.3. While iterating over all preconditions, the WG contacts the IGS for suitable grounding services for each leaf precondition of the Basic Workflow. After inserting required harmonisation Transformers at the beginning of the chain, the executable workflow is finished and finally dispatched to the Mediator Service.

3.3.3 Workflow Generator (WG) Module

Full Name: Mediation Tier → Workflow Design and Construction Service (WDCS) → Workflow Generator (WG) Module

The Workflow Generator (WG) provides two operations externally visible for other components.

Responsibilities of the Module

• Provide executable workflows to the Mediator Service Component.

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

36

2. Collaboration

- Is called by the Mediator Service for the delivery of executable workflows

- Calls RM module to retrieve Basic Workflows

- Calls the IGS to retrieve grounding/data services

3. Actions fulfilled by this Component/Module

- Enriches the BW with request specific constraint, as described in section 3.2

- Inserts harmonisation Transformers at the beginning or end of a BW. This process is described in section 3.2.3.

- Attaches data services to the leaf preconditions of a Basic Workflow.

4. Interface overview

The WG interface provides the following two operations to clients.

Table 1: Summary of the Workflow Generator Interface

Return type Operation

Workflow (Executable)

getWorkflow(MediatorComplexRequest mcr)

This is the main operation called to construct and retrieve a workflow by the Mediator. Within this method, the whole process of Basic Workflow retrieval, enrichment (with request specific constraints), insertion of harmonisation Transformers and the discovery of suitable grounding/data services is started. This process is described in section 3.2.3 in detail.

3.3.4 The Repository Manager (RM) Module

Full Name: Mediation Tier → Workflow Design and Construction Service (WDCS) → Repository Manager (RM)

The Repository Manager is a data store for Transformers and Basic Workflows. It is used by the WDCS as a backend storage module during design time.

1. Responsibilities of the Component/Module

- Manages workflows and Transformers hence allow creation, editing and deleting.

- Provides the Basic Workflow in presence of a concrete user request.

2. Collaboration

• Used by the DATA_CUSTODIAN (via the WDCS GUI) to manage Basic Workflows and Transformers.

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

37

3. Actions fulfilled by this Component/Module

• Adding Basic Workflows and Transformers

• Deleting Basic Workflows

• Updating Basic Workflows and Transformers

• Exploring Basic Workflows and Transformers

• Validating Basic Workflows according to the HUMBOLDT workflow model described in 3.2.

3. Interface overview

Return type Operation

List<Transformer> exploreTransformers(String keyword)

This method is used to find out which Transformers are already available in the repository given a keyword. The keyword is matched against the process description of each Transformer and Basic Workflow within the repository.

Workflow getBasicWorkflow(Concept featureType)

This is the method used by the Workflow Generator to retrieve the Basic Workflow identified using the Feature Type associated with it. It is assumed, that a workflow is uniquely identified by a Feature Type concept from a conceptual schema.

storeBasicWorkflow(Concept FeatureType, Workflow workflow, ProcessDescriptions pd)

This operation is used at design-time to create a new Basic Workflow. The process description is a textual description of the process offered by this Basic Workflow, created by the DATA_CUSTODIAN. The input parameter FeatureType is the newly created feature type from a conceptual schema that serves as a unique identifier for the BW.

Boolean removeBasicWorkflow(Concept featureType)

This operation is used to delete the basic workflow from the repository. A workflow is identified uniquely by a concept from some conceptual schema, representing the Feature Type associated with.

Transformer createTransformer(URL wpsUrl, String ProcessIdentifier, ProcessDescription pd)

This method is used by the DATA_INTEGRATOR when registering a new WPS process to the HUMBOLDT framework.

UUID updateTransformer(UUID id, Transformer transformer)

This method updates an existing Transformer

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

38

4 Information Viewpoint The aim of this section is to introduce the different data structures used by the WDCS. First, it has an internal model used for composition. Second, it delivers executable workflows to workflow engines, such as the HUMBOLDT Mediator Service.

4.1 The Workflow Interface

Figure 21: The Workflow Interface

Return type Operation

TransformerDescription getTerminalTransformer()

Returns the terminal Transformer, i.e. the one to be executed last.

Set<TransformerDescriptions> getTransformers()

Returns a set of all Transformers in this workflow.

WorkflowMetadata getWorkflowMetadata()

Returns metadata on this workflow such as creator, creation date etc.

Set<String> getKeywords()

Returns keywords describing the workflow.

boolean isValidWorkflow()

Returns true, if this is a valid workflow (i.e. if the Transformers in this workflow form a tree with the terminal transformer being the root node).

boolean isExecutableWorkflow()

Returns true, is this workflow is potentially executable. This is the case, if for all Transformers in this workflow, GetStatus() returns Ready.

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

39

4.2 Transformer interfaces

Figure 22: Transformer Interfaces

4.2.1 Transformer

Return type Operation

UUID getID()

Returns the UUID of this transformer.

String() getName()

Returns a name of this Transformer, e.g. “buffer”, “overlay”. This name can be used for keyword-search in the repository.

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

40

4.2.2 TransformerDescriptionDTO

Return type Operation

Set<Precondition> getPreconditions()

Returns a set of preconditions of this transformer.

Postcondition getPostcondition()

Returns the postcondition of this Transformer.

GroundingInformation getGroundingInformation()

Returns the grounding information of this Transformer. In case of a java-Transformer, it returns a java class name. In case of a WPS Transformer, it returns the WPS URL and the ProcessIdentifier as it appears in the GetCapabilities response of the WPS.

Set<String> getKeywords()

Returns a set of keywords, usually describing the functionality of this Transformer.

4.2.3 HarmonisationTransformerDescriptionDTO

Return type Operation

HarmonisationCategory getHarmonisationCategory()

Returns the harmonisation category, this Transformer belongs to.

Map<String, String> getParameterMapping()

This method returns a mapping of the parameter names of the individual process to parameter names of parameters attached to the harmonisation category. In an abstract sense, this method returns the semantic annotation of the inputs. It is required for dynamic binding / automated harmonisation. Example:

The harmonisation category SpatialReferenceSystemHarmonisation has the following generic method signature attached to it:

signature(SourceSRS, TargetSRS, LayerToBeTransformed).

With each name of each parameter, a certain meaning is attached (which is implemented within the WDCS program code). In order to enable the WDCS to dynamically bind to a certain harmonisation

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

41

Return type Operation

Transformer, it must know how to instantiate the inputs. This knowledge has to be provided by a human who registers the Transformer and is then formalised / stored in the mapping returned by this method. Assume, there is a spatial reference system Transformer with the following signature:

Signature(target_crs, source_crs, layer)

In this case, the mapping that is stored and returned via this method is:

target_crs TargetSRS

source_crs SourceSRS

layer LayerToBeTransformed.

boolean isValidHarmonisationTransformer()

Returns true, is this Transformer is a valid harmonisation transformer for that transformation category. In principle, this means, if the mapping that is returned via getParameterMapping() contains a value for each mandatory input of the individual Transformation Service. Assume, there is a mandatory input of an individual harmonisation service that is not mapped / semantically annotated with an input of the generic signature. In this case, the WDCS does not know how to instantiate this input / with which values and is therefore not able to deliver this information to the Mediator Service. Hence, the MS can not dynamically bind to / execute that service.

4.2.4 TransformerDescription

Return type Operation

UUID getUniqueTransformerID()

Returns the UUID of this transformer unique to a workflow. This is required, in case a single Transformer appears twice in a workflow. In such case, the different roles within a workflow must be uniquely identifiable.

TransformerStatus getStatus()

Returns Ready, if each precondition of each Transformer returns Satisfied. In any other case, it returns Incomplete.

4.3 Pre-/Postconditions interfaces This section gives the data structures for pre- and postconditions in HUMBOLDT.

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

42

4.3.1 Data Structure for Preconditions

Figure 23 shows the data model for preconditions.

Figure 23: Preconditions

Return type Operation

String getName()

Returns the name of the parameter / input. For example, if there is a process with a signature buffer(bufferDistance int, layerToBeBufferd Layer), the precondition representing the bufferDistance returns “bufferDistance” when getName() is called.

PreconditionStatus getStatus()

Returns the status of this precondition, i.e. whether there is some data specified that can serve as input. In case of literal, this returns Satisfied if there is a value (getValue()) or a pointer to another transformer (getResultPointer()) set. In case of complex, it returns Satisfied in case there is a pointer to another transformer or data source set (getResultPointer()). In all other cases, getStatus() returns Unsatisfied.

String getValue()

Returns the value if it is set. For example, the buffer distance.

java.class getDatatype()

Returns the simple datatype of this precondition, e.g. in case of a buffer distance Integer or Double. Since the HUMBOLDT processing aims at being independent from encodings of data types (e.g. XML, java), the datatype that is returned conforms to a generic type model for simple datatypes.

org.jscience.physics.units.SI getUoM()

Returns the unit of measure of this literal input. For example, in case of

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

43

Return type Operation

a buffer distance, it returns whether the distance is interpreted within the service as kilometres or metres.

Set<String> getSetOfAllowedValues()

Returns a set of strings representing allows values for this input. For example, in case the input represents some input that controls the algorithm to be used within the service (e.g. a specific spatial reference system transformation), then usually there is a fixed set of allowed inputs. This method returns this set.

ResultPointer getResultPointer()

Returns a pointer to where the data for this complex input (e.g. WFS layer etc.) should be derived from. Points either to data source or to another transformer. Hence, the result pointer is essentially the place, where the workflow information (the connection between individual transformers or data services) is stored.

Map<UUID, Constraint> getConstraints()

Returns a mapping of IDs to constraints that constrain the allowed input data sources using the HUMBOLDT constraint model. The IDs are used to store the cross-parameter constraints or cross input/output constraints as described previously. For example, if a WPS process has two inputs that must be in the same schema, then the schema constraint of both inputs would have the same ID. This allows the WDCS to automatically attach only data sources (or other transformers) that do not violate the cross-parameter constraint. Further, it allows the WDCS (via cross input / output constraints) to recognize how the output of a transformation relates to the inputs. For example, whether the output will have (after execution) the same schema as the input).

Range getRange()

Returns a range of allowed values. Can only be applied if the datatype of this LiteralPrecondition is a number.

4.3.2 Data Structure for Postconditions

Postconditions refer to the output of a service. Therefore, they do not hold a value (in case of literal) or a pointer (ResultPointer) to a data source / another transformer where to derive the values from. The meaning of the other operations as they appear in Figure… is equal to the meaning as described for preconditions.

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

44

Figure 24: Postconditions

4.4 Data Structure for Workflow Exchange The WDCS delivers executable workflows to the MS in a data structure that all execution relevant data as described in section 2.6.1. Since this data structure is used by both WDCS and Mediator Service, its description is part of the HUMBOLDT commons specification. However, we include a short description here.

What is passed from the WDCS to the MS is a set of Transformer descriptions according to the data model shown in the above Figure. Each Transformer description contains information on…:

… how to access the processing components that are part of a workflow. Within HUMBOLDT, there are two different types of processes. Either the processes are directly implemented in java, or they are encapsulated in a WPS.

1. Java Transformer: In case, a Transformer is implemented in java, it must be accessible to the MS. Since it further implements an interface known to the MS, the MS can handle and execute it. The only information needed for identifying the Transformer is name of the java class that holds the implementation. This information is passed via JavaClassName.

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

45

2. WPS Transformer: In case, the process is offered via WPS, the WPS URL (WPS_URL) and the ProcessIdentifier (as it appears in the GetCapabilities response of the WPS, i.e. WPS_ProcessIdentifier) suffices.

… how to instantiate the formal (named) parameters. This information is essentially a mapping of named input parameters to something that can be used by the mediator to instantiate the parameters. The range of the mapping can have three different forms. The mapping maps a named input parameter either to…:

4. … a value, e.g. 5 or “input string” (i.e. value)

5. … a named output of another processing components or (since we assume processing components have single outputs) the name / identifier of another processing component (i.e. transformerID)

6. … a pointer to a data source, e.g. a WFS request (i.e. pointerToDS)

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

46

5 Summary and Outlook

This specification provided a description of the Workflow Design and Construction Services, the HUMBOLDT component responsible for the management of geospatial workflows. It included a description of the strategy for workflow design and constructions as well as the interfaces exposed to other framework components.

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

47

6 References [1] Open Geospatial Consortium Inc., OpenGIS Web Processing Service (Discussion Paper), 2007.

[2] Open Geospatial Consortium Inc., OpenGIS Geography Markup Language (GML) Encoding Specification, Version 3.2.1, 2007.

[3] W.M.P. van der Aalst, Ter, B. Kiepuszewski, und A.P. Barros, “Workflow Patterns,” Distributed and Parallel Databases, vol. 14, Juli. 2003, S. 5–51.

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

48

7 Annex

Annex A – Use Case Descriptions

UC WS01 – Create Basic Workflow Identifier of the use case: UC WS01.

Version: 1.1, 10.11.2009

Description

This Use Case describes the process of creating a Basic Workflow by the DATA_CUSTODIAN. A Basic Workflow is built using the Workflow Component GUI.

Actors

• DATA_CUSTODIAN • WDCS_SYSTEM

Initial Conditions

No Basic Workflow exists specifically for his task.

There must be Transformers available in the repository. Final Results

A new Feature Type, representing the output of the Basic Workflow is created and the Basic Workflow is stored in the workflow repository.

Processing

Main Process: Basic Workflow creation process

1. DATA_CUSTODIAN selects multiple (>=1) Transformers from the repository 2. DATA_CUSTODIAN connects two individual Transformers at a time to build a Basic Workflow

a. WDCS_SYSTEM checks whether the two Transformers can be connected and returns approval or rejection

3. DATA_CUSTODIAN updates the Basic Workflow’s preconditions with Use Case specific constraints (e.g. thematic constraints)

4. DATA_CUSTODIAN sets the values of the Basic Workflow’s simple preconditions, e.g. a Buffer Distance

5. DATA_CUSTODIAN repeats steps 1 to 4 until the Basic Workflow is finished 6. DATA_CUSTODIAN stores the Basic Workflow in the repository together with a Feature Type

description that uniquely identifies the Basic Workflow a. WDCS_SYSTEM checks, whether the Basic Workflow is valid (e.g. whether each

Transformer has at least one connection to another Transformer, whether the Basic Workflow only has a single output etc.)

Alternative processes

None

Exceptional situations

In case, the validity check (step 6.a.) fails, WDCS_SYSTEM informs DATA_CUSTODIAN who again edits the BW to remove the defect.

Processed data

None

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

49

Generated Data

Basic Workflow

Requirements

No specific requirements have been defined at the moment.

Rules of Processing

No specific rules of processing have been defined at the moment.

UC WS02 – Edit Basic Workflow Identifier of the use case: UC WS02.

Version: 1.0, 10.11.2009

Description

This Use Case described the process of editing an existing Basic Workflow. This is done by the DATA_CUSTODIAN via the GUI of WDCS_SYSTEM.

Actors

• DATA_CUSTODIAN • WDCS_SYSTEM

Initial Conditions

The Basic Workflow to be edited must exist in the repository Final Results

Processing

Main Process: Basic Workflow creation process

1. DATA_CUSTODIAN browses the WDCS_SYSTEM repository for Basic Workflows 2. DATA_CUSTODIAN selects a Basic Workflow from the repository 3. DATA_CUSTODIAN edits the Basic Workflow he selected, this involves the following steps

(all of them are optional): a. DATA_CUSTODIAN deletes the Basic Workflow b. DATA_CUSTODIAN changes the Basic Workflow’s preconditions

(adding/deleting/editing constraints) c. DATA_CUSTODIAN changes the values of the Basic Workflow’s simple

preconditions, e.g. a Buffer Distance d. DATA_CUSTODIAN adds Transformers e. DATA_CUSTODIAN adds new / changes connections between Transformers (i.e. the

data flow) f. DATA_CUSTODIAN deletes Transformers

Alternative processes

None

Exceptional situations

None

Processed data

None

Generated Data

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

50

An updated Basic Workflow.

Requirements

No specific requirements have been defined at the moment.

Rules of Processing

No specific rules of processing have been defined at the moment.

UC WS03 – Register Transformer Identifier of the use case: UC WS03.

Version: 1.0, 10.11.2009

Description

This Use Case described the process of registering a new Transformer to the HUMBOLDT framework. This is done by the DATA_INTEGRATOR via the GUI of WDCS_SYSTEM.

Actors

• DATA_INTEGRATOR • WDCS_SYSTEM

Initial Conditions

DATA_INTEGRATOR wants to register a process (e.g. spatial reference system transformation) to the framework. This process is either encapsulated within a WPS or directly implemented on the MS platform.

Final Results

A HUMBOLDT internal representation (Transformer description) of the WPS / java process, involving HUMBOLDT internal representations of pre- and postconditions. This newly created Transformer can then be used by DATA_CUSTODIAN to create new Basic Workflows.

Processing

Main Process: Basic Workflow creation process

1. DATA_INTEGRATOR registers a. the WPS URL + the identifier of the process as appearing in the GetCapabilities

document of the WPS to WDCS_SYSTEM in case, the process is a WPS process 1. WDCS_SYSTEM automatically parses the process description as appearing

in the response to the describe process request to the HUMBOLDT internal Transformer model.

b. the java class name of the java Transformer 1. DATA_INTEGRATOR provides the names and types of the parameters to

WDCS_SYSTEM 2. DATA_INTEGRATOR edits the pre- and postcondition of the newly created Transformer and

creates e.g. the metadata relevant for composition (relationship from in- to output wrt. to the HUMBOLDT constraint model etc.)

Alternative processes

None

Exceptional situations

None

Processed data

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

51

None

Generated Data

A new Transformer description stored in the repository

Requirements

Rules of Processing

No specific rules of processing have been defined at the moment.

UC WS04 – Register Harmonisation Transformer Identifier of the use case: UC WS04.

Version: 1.0, 10.11.2009

Description

This Use Case described the process of registering a new harmonisation Transformer to the HUMBOLDT framework. This is done by the DATA_INTEGRATOR via the GUI of WDCS_SYSTEM. Use Case WS03 must be carried out previously.

Actors

• DATA_INTEGRATOR • WDCS_SYSTEM

Initial Conditions

UC WS03 has been carried out previously.

Final Results

A Transformer description that holds all relevant information necessary for automatically handling ths Transformer.

Processing

Main Process: Basic Workflow creation process

1. DATA_INTEGRATOR chooses one of the harmonisation category this Transformer belongs to from the list of harmonisation categories known to WDCS_SYSTEM.

2. DATA_INTEGRATOR specifies a mapping of parameters of the individual Transformer to the parameters of the generic signature belonging to that transformation category.

Alternative processes

None

Exceptional situations

None

Processed data

None

Generated Data

A new Transformer description of a harmonisation transformer.

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

52

Requirements

No specific requirements have been defined at the moment.

Rules of Processing

No specific rules of processing have been defined at the moment.

UC WS05 – Manage Transformer Identifier of the use case: UC WS05.

Version: 1.0, 10.11.2009

Description

This Use Case describes the process of managing existing Transformers. The DATA_INTEGRATOR uses the WDCS_SYSTEM` GUI for accomplishing this task

Actors

• DATA_INTEGRATOR • WDCS_SYSTEM

Initial Conditions

DATA_INTEGRATOR has selected a Transformer to be edited.

Final Results

An edited/changed Transformer.

Processing

Main Process: Basic Workflow creation process

The process involves all management tasks related to an individual processing:

1. DATA_INTEGRATOR edits the pre- and postconditions of the newly created Transformer. 2. DATA_INTEGRATOR deletes the Transformer 3. DATA_INTEGRATOR changes the binding (java class name / WPS URL) of the Transformer

Alternative processes

None

Exceptional situations

None

Processed data

None

Generated Data

An edited Transformer

Requirements

No specific requirements have been defined at the moment.

Rules of Processing

No specific rules of processing have been defined at the moment.

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

53

UC WS06 – Execute Workflow Identifier of the use case: UC WS06.

Version: 1.0, 10.11.2009

Description

This Use Case describes the creation of an executable workflow (on the basis of a Basic Workflow) in the presence of a user request.

Actors

• MS_END_USER • WDCS_SYSTEM • MS_SYSTEM • IGS_SYSTEM

Initial Conditions

MS_END_USER has sent a request to MS_SYSTEM.

Final Results

An executable workflow description.

Processing

Main Process: Basic Workflow creation process

2. WDCS_SYSTEM receives request from MS_SYSTEM and identifies Basic Workflow. 3. WDCS_SYSTEM enriches the Basic Workflow with request specific information, such as the

requested SRS 4. WDCS_SYSTEM contacts IGS_SYSTEM for discovering grounding services that provide

suitable input to the enriched Basic Workflow a. In case no perfect matches are found, WDCS_SYSTEM automatically inserts

harmonization Transformers into the Basic Workflow 5. WDCS_SYSTEM sends an executable workflow file (xml) to MS_SYSTEM

Alternative processes

None

Exceptional situations

No executable workflow can be built for a user request.

Processed data

- User request

- Basic Workflow

- IGS responses, containing metadata on data services

Generated Data

Executable Workflow

Requirements

No specific requirements have been defined at the moment.

Rules of Processing

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

54

No specific rules of processing have been defined at the moment.

Annex B – WDCS WSDL <?xml version="1.0" encoding="UTF-16"?> <wsdl:definitions name="WDCS_WSDL" targetNamespace="http://esdi-humboldt.eu/schemas/wdcsWsdlTypes xmlns:ms="http://esdi-humboldt.eu/schemas/mediator" xmlns:tns="http://esdi-humboldt.eu/schemas/wdcs_wsdl" xmlns:ws="http://esdi-humboldt.eu/schemas/workflowexchange" xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/" xmlns:soap="http://schemas.xmlsoap.org/wsdl/soap/" xmlns:http="http://schemas.xmlsoap.org/wsdl/http/" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <wsdl:types>

<xs:schema targetNamespace="http://esdi-humboldt.eu/schemas/wdcsWsdlTypes" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xsd:import namespace="http://esdi-humboldt.eu/schemas/mediator" schemaLocation="http://esdi- humboldt.eu/schemas/mediator/mediator.xsd"/> <xsd:import namespace="http://esdi-humboldt.eu/schemas/workflowexchange" schemaLocation="http://esdi- humboldt.eu/schemas/workflowexchange/workflowschema.xsd"/>

</xs:schema> </wsdl:types> <wsdl:message name="getWorkflowResponse"> <wsdl:part name="workflow" type="ws:WorkflowType"/> </wsdl:message> <wsdl:message name="getWorkflowRequest"> <wsdl:part name="request" type="ms:MediatorComplexRequest"/> </wsdl:message> <wsdl:portType name="getWorkflow"> <wsdl:operation name="getWorkflow">

<wsdl:input name="Request" message="tns:getWorkflowRequest"/> <wsdl:output name="Response" message="tns:getWorkflowResponse"/>

</wsdl:operation> </wsdl:portType> <wsdl:binding name="getWorkflowSOAP" type="tns:getWorkflow">

<soap:binding style="document" transport="http://schemas.xmlsoap.org/soap/http"/>

<wsdl:operation name="getWorkflowSOAP"> <soap:operation soapAction="http://esdi-humboldt.eu/schemas/wdcs_wsdl/GetWorkflow" style="document"/>

<wsdl:input name="Request"> <soap:body use="literal"/> </wsdl:input> <wsdl:output name="Response"> <soap:body use="literal"/> </wsdl:output> </wsdl:operation> </wsdl:binding>

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

55

<wsdl:service name="WDCS"> <wsdl:port name="getWorkflowSOAPPort" binding="tns:getWorkflowSOAP">

<soap:address location="http://www.exampleURI.com/WSDLPackage1/SamplePortSOAP"/>

</wsdl:port> </wsdl:service> </wsdl:definitions>

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

56

Annex C – Figures

Figure 25: Large version of Figure 16

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification

57

Figure 26: Large Version of Figure 17