28
Workflow” in Data Access and Integration An OGSA-DAI/DAIS Perspective Mario Antonioletti EPCC [email protected]

“Workflow” in Data Access and Integration An OGSA-DAI/DAIS Perspective Mario Antonioletti EPCC [email protected]

Embed Size (px)

Citation preview

Page 1: “Workflow” in Data Access and Integration An OGSA-DAI/DAIS Perspective Mario Antonioletti EPCC mario@epcc.ed.ac.uk

“Workflow” in Data Access and Integration

An OGSA-DAI/DAIS Perspective

Mario Antonioletti

EPCC

[email protected]

Page 2: “Workflow” in Data Access and Integration An OGSA-DAI/DAIS Perspective Mario Antonioletti EPCC mario@epcc.ed.ac.uk

e-Science Workflow Services - www.ogsadai.org.uk 2

Talk Overview

Background: OGSA-DAI and DAIS Motivation and Definitions Hierarchies of Service Coordination Conclusions

Page 3: “Workflow” in Data Access and Integration An OGSA-DAI/DAIS Perspective Mario Antonioletti EPCC mario@epcc.ed.ac.uk

e-Science Workflow Services - www.ogsadai.org.uk 3

OGSA-DAI and DAIS GGF DAIS WG

Database Access and Integration Services Attempting to standardise interfaces based on OGSI

OGSA-DAI Aim to provide an implementation of DAIS Serve UK e-Science Community

OGSA-DAI and DAIS Currently not aligned

Data service interface in OGSA-DAI coarse grained Based on an earlier version of DAIS

Data service interface in DAIS currently fine grained Scope for more coarse grained interfaces

OGSA-DAI will realign DAIS once the latter stabilizes

Page 4: “Workflow” in Data Access and Integration An OGSA-DAI/DAIS Perspective Mario Antonioletti EPCC mario@epcc.ed.ac.uk

e-Science Workflow Services - www.ogsadai.org.uk 4

OGSA-DAI Project Partners

Powered by ….

Page 5: “Workflow” in Data Access and Integration An OGSA-DAI/DAIS Perspective Mario Antonioletti EPCC mario@epcc.ed.ac.uk

e-Science Workflow Services - www.ogsadai.org.uk 5

Data Resource

1. Provides access to a data resource.

Simple Data Service Scenario

Client Data Service

Data Resource

Data Resource2. May provide integration of several data resources.

Page 6: “Workflow” in Data Access and Integration An OGSA-DAI/DAIS Perspective Mario Antonioletti EPCC mario@epcc.ed.ac.uk

e-Science Workflow Services - www.ogsadai.org.uk 6

Some Definitions

Data Resource An object that can source/sink data Currently databases in scope

Files and file systems may come in scope

Data Services Grid services Provides common interface to data resources Exposes some capabilities of a data resource

SQL Queries, XPath, BinX, …

Can also provide additional capabilities Transformations, Third party data delivery, etc …

Page 7: “Workflow” in Data Access and Integration An OGSA-DAI/DAIS Perspective Mario Antonioletti EPCC mario@epcc.ed.ac.uk

e-Science Workflow Services - www.ogsadai.org.uk 7

Motivation Want common interfaces for:

Data access Data integration

As requests to data service may produce lots of data Want to minimise data movement

Hence encapsulate interactions with service Serialise multiple interactions into one interaction Abstract each interaction into an “activity” Data flows between activities Use a document mechanism to describe this

DAIS and OGSA-DAI Concerned with data flow Currently do not have control constructs

No looping, conditionals, splits, joins, …

Page 8: “Workflow” in Data Access and Integration An OGSA-DAI/DAIS Perspective Mario Antonioletti EPCC mario@epcc.ed.ac.uk

e-Science Workflow Services - www.ogsadai.org.uk 8

Service Coordination Patterns

Client Data Service

1. Coordinate of activities

performed at one Data Service.

Data Service

2. Client choreographs a set of services to work together.

ServiceService

Service

… or a service mayorchestrate on behalf of the client.

3. Orchestration of services using a document directed to one service.4. Possibly interface with standard workflow languages, e.g. BPEL4WS, WSCI, …

Page 9: “Workflow” in Data Access and Integration An OGSA-DAI/DAIS Perspective Mario Antonioletti EPCC mario@epcc.ed.ac.uk

e-Science Workflow Services - www.ogsadai.org.uk 9

Coordination Hierarchies

Service coordination may take place: Intra service

Document based

Inter services – application driven Choreographed/orchestrated by a client or service

Inter service – document driven Orchestration Ideally would look the same

as the intra service document based interface

Combined with other workflow languages

Page 10: “Workflow” in Data Access and Integration An OGSA-DAI/DAIS Perspective Mario Antonioletti EPCC mario@epcc.ed.ac.uk

e-Science Workflow Services - www.ogsadai.org.uk 10

Intra Service Processing

Service processing described by a document Possible activities (OGSA-DAI perspective):

Statement SQL Query, XPath Query

Delivery Input data from third party Output data to a third party Deliver data in the response

Transformations XSL Transformations, compression

OGSA-DAI has produced a framework for this

Page 11: “Workflow” in Data Access and Integration An OGSA-DAI/DAIS Perspective Mario Antonioletti EPCC mario@epcc.ed.ac.uk

e-Science Workflow Services - www.ogsadai.org.uk 11

Simple Example: no data flow

sqlQueryStatement

DeliverToURL

<sqlQueryStatement name="statement"> <expression> select * from myTable where id=10 </expression></sqlQueryStatement>

<deliverToURL name="deliverOutput"> <toURL> ftp://anon:[email protected]/home </toURL> </deliverToURL>

Page 12: “Workflow” in Data Access and Integration An OGSA-DAI/DAIS Perspective Mario Antonioletti EPCC mario@epcc.ed.ac.uk

e-Science Workflow Services - www.ogsadai.org.uk 12

Simple Example: with data flow

DeliverToURL

<sqlQueryStatement name="statement"> <expression> select * from myTable where id=10 </expression> <resultSetStream name=“output1"/></sqlQueryStatement>

<deliverToURL name="deliverOutput"> <fromLocal from=“output1"/> <toURL> ftp://anon:[email protected]/home </toURL></deliverToURL>

sqlQueryStatement

Page 13: “Workflow” in Data Access and Integration An OGSA-DAI/DAIS Perspective Mario Antonioletti EPCC mario@epcc.ed.ac.uk

e-Science Workflow Services - www.ogsadai.org.uk 13

The Perform Document<?xml version="1.0" encoding="UTF-8"?>

<gridDataServicePerform

xmlns="http://ogsadai.org.uk/namespaces/2003/07/gds/types"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://ogsadai.org.uk/namespaces/2003/07/gds/types

../../../../schema/ogsadai/xsd/activities/activities.xsd">

<documentation>

This example performs a simple select statement to retrieve

one row from the test database. The results are delivered

within the response document.

</documentation>

<sqlQueryStatement name="statement">

<expression>

select * from littleblackbook where id=10

</expression>

<resultSetStream name=“output"/>

</sqlQueryStatement>

<deliverToURL name="deliverOutput">

<fromLocal from=“output"/>

<toURL>ftp://anon:[email protected]/home</toURL>

</deliverToURL>

</gridDataServicePerform>

Page 14: “Workflow” in Data Access and Integration An OGSA-DAI/DAIS Perspective Mario Antonioletti EPCC mario@epcc.ed.ac.uk

e-Science Workflow Services - www.ogsadai.org.uk 14

Predefined Building Blocks

sqlQueryStatement

sqlStoredProcedure

sqlUpdateStatement

sqlBulkLoadRowset

xPathStatement

xUpdateStatement

xQueryStatement

xmlResourceManagement

xmlCollectionManagement

relationalResourceManager

gzipCompression

zipArchive

xslTransform

inputStream

outputStream

DeliverFromURL

DeliverToURL

DeliverToGFTP

DeliverFromGFTP

DeliverToStream

DeliverFromGDT DeliverToGDT

Page 15: “Workflow” in Data Access and Integration An OGSA-DAI/DAIS Perspective Mario Antonioletti EPCC mario@epcc.ed.ac.uk

e-Science Workflow Services - www.ogsadai.org.uk 15

Activities: positives

Simple sequence pattern Data-flow

Avoid multiple message exchanges Minimise data movement Extensible

XML Schema excerpt gives syntax Associate an implementation with activity Done at configuration

Allows optimisation Enactment engine can optimise interaction

Page 16: “Workflow” in Data Access and Integration An OGSA-DAI/DAIS Perspective Mario Antonioletti EPCC mario@epcc.ed.ac.uk

e-Science Workflow Services - www.ogsadai.org.uk 16

Activities: negatives Incomplete syntax

Activity inputs and outputs are not typed No typing of data streams Possible issue in coming up with a sensible document

Activity implementation & XML schema loosely coupled Keeping activity and implementation in synch

Semantics are not specified Puts work load on the server

Workloads on the server may need to be managed Activities not exposed at the interface level

This may change in line with DAIS Perform document factored out from DAIS base specs

Standardisation to become a DAIS informational document Scope may be bigger than DAIS

Page 17: “Workflow” in Data Access and Integration An OGSA-DAI/DAIS Perspective Mario Antonioletti EPCC mario@epcc.ed.ac.uk

e-Science Workflow Services - www.ogsadai.org.uk 17

Inter Service Application Defined "Workflow"

Services stitched together by an application Could be a client

Use the OGSA-DAI GridDataTransport (GDT) portType

Could be another service Distributed Query Processing (DQP)

Service configured separately Each performs its part in the workflow

Page 18: “Workflow” in Data Access and Integration An OGSA-DAI/DAIS Perspective Mario Antonioletti EPCC mario@epcc.ed.ac.uk

e-Science Workflow Services - www.ogsadai.org.uk 18

Client Driven Scenario (aka poor man's data integration)

Client

Data Service

Data Service

<inputStream … /><sqlUpdateStatement>…</sqlUpdateStatement>

<sqlQueryStatement>…</sqlQueryStatement><deliverToGDT … />

GDT

Client creates Data Services.

Page 19: “Workflow” in Data Access and Integration An OGSA-DAI/DAIS Perspective Mario Antonioletti EPCC mario@epcc.ed.ac.uk

e-Science Workflow Services - www.ogsadai.org.uk 19

Service Driven Scenario

Client

Query planning,compilation, scheduling,evaluation, partitioning

GDQS

GQES

GQES

GQES

Evaluate sub-queriesDistributed Query Processing

Page 20: “Workflow” in Data Access and Integration An OGSA-DAI/DAIS Perspective Mario Antonioletti EPCC mario@epcc.ed.ac.uk

e-Science Workflow Services - www.ogsadai.org.uk 20

More Complex DQP Scenario

GFactory G Q ES F

GFactory G Q ES F

GFactory G Q ES F

N 2

N 1

N3

GC lie n tGG D S

GG D S

G D Q

G D T

G D Q S

N 0G D S

GFactory G Q ES F

N4

p erform (Q u ery)1

cre a te S e rv ice

cre a te S e rv ice2

cre ate S e rvi ce

2

2

GG D S G Q ES 2

G D T

GG D S G Q ES 3

G D T

GG D S G Q ES 1

G D T

GG D S G Q ES 1

G D T

p erform (Q u ery S u b p la n )

p erform (Q u ery S u b p la n )

perform(Q

uer ySu bpl an)

3

s eq u en t ial_ s can

red u ce (p r o tein ID ,s eq u en ce )

s eq u en t ial_ s can ( ter m = 8 3 7 2 )

red u ce (p r o tein ID )

h as h _ jo in(p .p r o tein ID = t.p r o tein ID )

3

o p erat io n _ callb la s t(p .s eq u en ce)

red u ce (p .p r o tein ID , b la s t)

o p erat io n _ callb la s t(p .s eq u en ce)

red u ce (p .p r o tein ID , b la s t)

3

W e b S e rvi ce s (B L A S T)

resu lts

resu lts

resu lts

4

1144

Page 21: “Workflow” in Data Access and Integration An OGSA-DAI/DAIS Perspective Mario Antonioletti EPCC mario@epcc.ed.ac.uk

e-Science Workflow Services - www.ogsadai.org.uk 21

Application Driven "Workflow" Labour intensive

Client driven (service choreography) Restricted to small numbers of services

Need tooling Even then this is best done through other means

Service driven (service orchestration) DQP hides details There may be other examples …

Need to explore this space further Can probably accommodate these patterns in an

existing workflow language For more general data integration need:

Describe more sophisticated behaviour

Page 22: “Workflow” in Data Access and Integration An OGSA-DAI/DAIS Perspective Mario Antonioletti EPCC mario@epcc.ed.ac.uk

e-Science Workflow Services - www.ogsadai.org.uk 22

Inter Service Document Coordination

Currently evolving Document describes:

Sequence of operations that may span multiple services

Single document includes enough information to: Run an expression on a source data service Deliver the results to a target data service Run and expression on the target data service

Informational document to be presented at GGF10

Page 23: “Workflow” in Data Access and Integration An OGSA-DAI/DAIS Perspective Mario Antonioletti EPCC mario@epcc.ed.ac.uk

e-Science Workflow Services - www.ogsadai.org.uk 23

A Dataset Example

Client Data Service

RequestDataRequest.xsd<dataRequest> …</dataRequest>

RemoteRequiredTableDataAccessRecipe.xsd<dar> <gsh> … </gsh> <type> …</type> <dataSet>

… </dataSet></dar>

Data Service

Page 24: “Workflow” in Data Access and Integration An OGSA-DAI/DAIS Perspective Mario Antonioletti EPCC mario@epcc.ed.ac.uk

e-Science Workflow Services - www.ogsadai.org.uk 24

Document Driven "Workflow"

Work in this area is tentative No implementations as yet

OGSA-DAI needs to see how it matures

Shows versatility Carries over some of the OGSA-DAI activity framework

Focused on data Can track provenance in the dataSet

Needs to be positioned against general workflow languages

Page 25: “Workflow” in Data Access and Integration An OGSA-DAI/DAIS Perspective Mario Antonioletti EPCC mario@epcc.ed.ac.uk

e-Science Workflow Services - www.ogsadai.org.uk 25

Traditional Workflow OGSA-DAI has not explored this space … yet

May need such a framework to facilitate data integration Traditionally workflow:

Revolves around the execution of atomic activities Use a processing model, e.g. WfMC based

Akin to how people talk about service orchestration Want to use existing frameworks as far as possible

OGSA-DAI does not want to define its own workflow DAIS may come up with something

Clearly: Activity model can be used to implement a workflow Collecting use cases

Page 26: “Workflow” in Data Access and Integration An OGSA-DAI/DAIS Perspective Mario Antonioletti EPCC mario@epcc.ed.ac.uk

e-Science Workflow Services - www.ogsadai.org.uk 26

Workflow Issues

OGSA-DAI needs to play to see what works Standards still evolving

IP rights: BPEL4WS

Royalty-free … ? WSCI

Royalty-free

Need workflow engines Tooling to construct workflow

Ptolemy II … Triana … ?

Page 27: “Workflow” in Data Access and Integration An OGSA-DAI/DAIS Perspective Mario Antonioletti EPCC mario@epcc.ed.ac.uk

e-Science Workflow Services - www.ogsadai.org.uk 27

Summary & Conclusions Base standards in a state of flux

DAIS not settled down yet If you don't like what you see get involved and change it

Document based interface needs to be re-worked OGSA-DAI implemented simple "workflow" patterns

Successful for data access Shied away from real workflow Should try to use emerging standards if possible

Data integration will require workflow patterns Need to examine use cases

Positioning of OGSA-DAI Want it to be the leaves of your complex workflow graphs Wrap your data sources and sinks

Try OGSA-DAI and feedback!

Page 28: “Workflow” in Data Access and Integration An OGSA-DAI/DAIS Perspective Mario Antonioletti EPCC mario@epcc.ed.ac.uk

e-Science Workflow Services - www.ogsadai.org.uk 28

Further information The OGSA-DAI Project Site:

http://www.ogsadai.org.uk The DAIS-WG site:

http://cs.man.ac.uk/grid-db OGSA-DAI Users Mailing list

[email protected] General discussion on grid DAI matters

Formal support for OGSA-DAI releases http://www.ogsadai.org.uk/support [email protected]

OGSA-DAI training courses