55
Workflows over Grid-based Web services General framework and a practical case in structural biology BioMOBY Services Enrique de Andrés

BioMOBY Services

  • Upload
    afram

  • View
    43

  • Download
    0

Embed Size (px)

DESCRIPTION

BioMOBY Services. Enrique de Andrés. Outline. The problem The BioMOBY idea BioMOBY ontologies How BioMOBY works Message exchanges BioMOBY elements. The problem…. Scientific work requires: Data resources: Genomic sequences, protein sets, expression data, … Computational resource: - PowerPoint PPT Presentation

Citation preview

Page 1: BioMOBY Services

Workflows over Grid-based Web servicesGeneral framework and a practical case in structural biology

BioMOBY Services

Enrique de Andrés

Page 2: BioMOBY Services

2/20/2007 BioMOBY Services 2

Outline

• The problem• The BioMOBY idea• BioMOBY ontologies• How BioMOBY works• Message exchanges• BioMOBY elements

Page 3: BioMOBY Services

2/20/2007 BioMOBY Services 3

The problem…

• Scientific work requires:– Data resources:

• Genomic sequences, protein sets, expression data, …

– Computational resource:• Similarity searches, alignments, domain prediction, functional classification,

clustering, …

• Often, these resources are existent and available, but:– Hard to find.

– Distributed all over the world.

– No common format.

Page 4: BioMOBY Services

2/20/2007 BioMOBY Services 4

Result… painful research!

Page 5: BioMOBY Services

2/20/2007 BioMOBY Services 5

Solution…

• Web Services:– Provides data or computational resources over the WWW.

– Can be accessed automatically:• application-centric web

• Additional advantages:– Works for every one who has internet access

• No firewall obstacles, …

– Independent of programming languages.

– Usage of broadly accepted protocols.

Page 6: BioMOBY Services

2/20/2007 BioMOBY Services 6

Outline

• The problem• The BioMOBY idea• BioMOBY ontologies• How BioMOBY works• Message exchanges• BioMOBY elements

Page 7: BioMOBY Services

2/20/2007 BioMOBY Services 7

BioMOBY

• BioMOBY was initiated in 2001 as collaboration of some model organism database providers.

• System for interoperability between biological data hosts and analytical services.

– Simple, open source platform for discovery, integration, representation and retrieval of biological data.

• Two branches:– MOBY-S: follows the Web Service paradigm.– S-MOBY: using semantic web technology (not covered here).

Page 8: BioMOBY Services

2/20/2007 BioMOBY Services 8

The MOBY-S plan

• Create an ontology of bioinformatics data-types.• Define a serialization of this ontology (data syntax).• Create an open API over this ontology (let independent service

providers build data-types).• Define Web Service inputs and outputs using that ontology• Register services in an ontology-aware registry.

• BioMOBY advantages:– Machines can find an appropriate service.

– Machines can execute that service unattended.

– Ontology is community-extensible

Page 9: BioMOBY Services

2/20/2007 BioMOBY Services 9

MOBY-S vs. General WS

• The registry is the MOBY-Central.

• Usage of ontologies.

• BioMOBY services operate on MOBY objects.

• Usage of namespaces.

• Own messaging structure for registration, detection and invocation of services

Page 10: BioMOBY Services

2/20/2007 BioMOBY Services 10

Outline

• The problem• The BioMOBY idea• BioMOBY ontologies

– Object ontology

– Service ontology

• How BioMOBY works• Message exchanges• BioMOBY elements

Page 11: BioMOBY Services

2/20/2007 BioMOBY Services 11

BioMOBY ontology

• Ontology:– A formally defined system of things and relations between these things

for representation of knowledge.

– Usually, an ontology builds a hierarchy of objects to describe relations in a certain domain.

• BioMOBY ontology:– Usage of namespaces.

– Object (data) ontology:• Semantic/syntactic data-types.

– Service ontology.

Page 12: BioMOBY Services

2/20/2007 BioMOBY Services 12

Object ontology

• Any identifiable piece of data is an “entity”.

• Identifiers for these entities fall under “Namespaces”– NCBI has gi numbers (gi namespace)– GO terms have accession numbers (GO namespace)

• Namespaces indicate data’s semantic type.– GO:0003476 a Gene Ontology Term– gi|163483 a GenBank record

• Namespace + ID precisely specifies a data “entity”

• Identifiers are not opaque – they are semantically rich

Page 13: BioMOBY Services

2/20/2007 BioMOBY Services 13

Object ontology

• Data types defined in an open, shared GO-like ontology:– GO used as a model because of its familiarity in the community.

– Nodes define data classes

– Edges define the relationships between classes.

• Edges define one of three relationships:– ISA:

• Inheritance relationship.• All properties of the parent are present in the child.

– HASA:• Container relationship of exactly 1.

– HAS: • Container relationship with 1 or more

node

node

Edge

Page 14: BioMOBY Services

2/20/2007 BioMOBY Services 14

The simplest MOBY data-type

<Object namespace=‘NCBI_gi’ id=‘111076’/>

Object

The combination of a namespace and an identifier within that namespace uniquely identify a data entity, not its location(s), nor its representation

Page 15: BioMOBY Services

2/20/2007 BioMOBY Services 15

Primitive Data-types

Object

Integer

String

Float

DateTimeISA

ISA

ISA

ISA

<Integer namespace=‘’ id=‘’>38</Integer>

Page 16: BioMOBY Services

2/20/2007 BioMOBY Services 16

<VirtualSequence namespace=‘NCBI_gi’ id=‘111076’> <Integer namespace=‘’ id=‘’ articleName=“length”>38</Integer></VirtualSequence >

Object

Integer

VirtualSequence

String

ISA

ISA

ISA

HASA

Derived data-types

Page 17: BioMOBY Services

2/20/2007 BioMOBY Services 17

<GenericSequence namespace=‘NCBI_gi’ id=‘111076’> <Integer namespace=‘’ id=‘’ articleName=“length”>38</Integer> <String namespace=‘’ id=‘’ articleName=“SequenceString”>

ATGATGATAGATAGAGGGCCCGGCGCGCGCGCGCGC </String></ GenericSequence >

Object

Integer

VirtualSequence

String

ISA

ISA

ISA

HASA

GenericSequence

ISA

HASA

Derived data-types

Page 18: BioMOBY Services

2/20/2007 BioMOBY Services 18

<DNASequence namespace=‘NCBI_gi’ id=‘111076’> <Integer namespace=‘’ id=‘’ articleName=“length”>38</Integer> <String namespace=‘’ id=‘’ articleName=“SequenceString”>

ATGATGATAGATAGAGGGCCCGGCGCGCGCGCGCGC </String></ DNASequence >

Object

Integer

VirtualSequence

String

ISA

ISA

ISA

HASA

GenericSequence

ISA

HASA

DNASequence

ISA

Derived data-types

Page 19: BioMOBY Services

2/20/2007 BioMOBY Services 19

Legacy file formats

• Containing “String” allow us to define ontological classes that represent legacy data-types.

<NCBI_Blast_Report namespace=‘NCBI_gi’ id=‘115325’><String namespace=‘’ id=‘’ articleName=‘content’>

TBLASTN 2.0.4 [Feb-24-1998]

Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.Sch&auml;ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman(1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402.

Query= gi|1401126 (504 letters)

Database: Non-redundant GenBank+EMBL+DDBJ+PDB sequences 336,723 sequences; 677,679,054 total letters

Searchingdone Score ESequences producing significant alignments: (bits) Value

gb|U49928|HSU49928 Homo sapiens TAK1 binding protein (TAB1) mRNA... 1009 0.0emb|Z36985|PTPP2CMR P.tetraurelia mRNA for protein phosphatase t... 58 4e-07

</String></NCBI_Blast_Report>

Page 20: BioMOBY Services

2/20/2007 BioMOBY Services 20

Binaries – pictures, movies, …

• We base64 encode binaries, and then define a hierarchy of data classes that Contain String

• base64_encoded_jpeg ISA text/base64 ISA text/plain HASA String

<base64_encoded_jpeg namespace=‘TAIR_image’ id=‘3343532’><String namespace=‘’ id=‘’ articleName=‘content’>MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJQDCCAv4wggJnoAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjELMAkGA1UEBhMCWkExFTATBgNVMIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJQDCCAv4wggJnoAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjELMAkGA1UEBhMCWkExFTATBgNVBAgTDFdlc3Rlcm4gQ2FwZTESMBAGA1UEBxMJQ2FwZSBUb3duMQ8wDQYDVQQKEwZUaGF3dGUxHTAbBgNVBAsTFENlcnRpZmljYXRlIFNlcnZpY2VzMSgwJgYDVQQDEx9QZXJzb25hbCBGcmVlbWFpbCBSU0EgMjAwMC44LjMwMB4XDTAyMDkxNTIxMDkwMVoXDTAzMDkxNTIxMDkwMVowQjEfMB0GA1UEAxMWVGhhd3RlIEZyZWVtYWlsIE1lbWJlcjEfMB0GCSqGSIb3DQEJARYQamprM0Bt

</String></base64_encoded_jpeg>

Page 21: BioMOBY Services

2/20/2007 BioMOBY Services 21

Extending legacy data-types

• With legacy data-types defined, we can extend them as we see fit– annotated_jpeg ISA base64_encoded_jpeg– annotated_jpeg HASA 2D_Coordinate_set – annotated_jpeg HASA Description

<annotated_jpeg namespace=‘TAIR_Image’ id=‘3343532’><2D_Coordinate_set namespace=‘’ id=‘’ articleName=“pixelCoordinates”>

<Integer namespace=‘’ id=‘’ articleName=“x_coordinate”>3554</Integer><Integer namespace=‘’ id=‘’ articleName=“y_coordinate”>663</Integer>

</2D_Coordinate_set><String namespace=‘’ id=‘’ articleName=“Description”>

This is the phenotype of a ufo-1 mutant under long daylength, 16’C</String>

<String namespace=‘’ id=‘’ articleName=“content”>MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJQDCCAv4wggJnoAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjELMAkGA1UEBhMCWkExFTATBgNV

</String></annotated_jpeg>

Page 22: BioMOBY Services

2/20/2007 BioMOBY Services 22

Additional information

• Information Blocks provides the ability of including additional information into the objects

– Cross Reference Information Blocks (CRIB)– Provision Information Blocks (PIB)

<annotated_jpeg namespace=‘TAIR_Image’ id=‘3343532’><CrossReference>

<Object namespace=“TAIR_Allele” id=“ufo-1”/></CrossReference><2D_Coordinate_set namespace=‘’ id=‘’ articleName=“pixelCoordinates”>

<Integer namespace=‘’ id=‘’ articleName=“x_coordinate”>3554</Integer><Integer namespace=‘’ id=‘’ articleName=“y_coordinate”>663</Integer>

</2D_Coordinate_set><String namespace=‘’ id=‘’ articleName=“Description”>

This is the phenotype of a ufo-1 mutant under long daylength, 16’C</String><String namespace=‘’ id=‘’ articleName=“content”>

MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJQDCCAv4wggJnoAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjELMAkGA1UEBhMCWkExFTATBgNV

</String></annotated_jpeg>

Page 23: BioMOBY Services

2/20/2007 BioMOBY Services 23

Cross Reference Information Blocks (CRIB)

• Content of the CRIB may include only two types of element: – A base MOBY Object ('Object' Class)

• cross-referenced piece of data

– An Xref type Cross-Reference object• service which could be executed in order to interpret the meaning of the

piece of data

<CrossReference> ... one or more cross-references...

</CrossReference>

<Xref namespace='' id='‘ authURI='' serviceName='‘ evidenceCode='' xrefType=''>

... Description ...</Xref>

<Object namespace=‘’ id=‘’/>

Page 24: BioMOBY Services

2/20/2007 BioMOBY Services 24

Cross Reference Information Blocks (CRIB)

• Namespace and id: fulfil the same role as in the Object style cross-reference.

• authURI and serviceName: act as a unique identifier to a particular MOBY Service that the current service provider suggests you execute using this cross-reference (namespace/id) in order to correctly interpret its meaning.

• xrefType:– should get its value from the Cross-Reference-Type Ontology which defines a

variety of semantic relationships that may exist between cross-references and the Objects that contain them. This ontology doesn't exist yet.

– now, xrefType’s are free form strings.

• evidenceCode: indicates the 'quality' of the evidence that was used to make the cross-reference assertion. It is a term from the GO evidence codes list:

– IC: Inferred by Curator – IDA: Inferred from Direct Assay– …

Page 25: BioMOBY Services

2/20/2007 BioMOBY Services 25

Cross Reference Information Blocks (CRIB)

<moby:CrossReference>

<moby:Object moby:namespace="PMID" moby:id="12511062"/><moby:Object moby:namespace="PMID" moby:id="12075666"/>

<moby:Xref moby:namespace="EMBL“ moby:id="X112345“ authURI="www.illuminae.com“ serviceName="getEMBLRecord" evidenceCode="IEA“ xrefType="transform"/>

</moby:CrossReference>

Page 26: BioMOBY Services

2/20/2007 BioMOBY Services 26

Provision Information Blocks (PIB)

• Contains metadata concerning the service that was invoked:– database version, software version, execution time

– additional parameters used to invoke the service, ...

• In the current MOBY API, the content of these elements is only loosely defined, and is meant primarily to be human-readable.

<ProvisionInformation>... one or more of the provision elements (below) ...

</ProvisionInformation>

<serviceSoftware software_name="" software_version="" software_comment=""/>

<serviceDatabase database_name="" database_version="" database_comment=""/>

<serviceComment>comment here</serviceComment>

Page 27: BioMOBY Services

2/20/2007 BioMOBY Services 27

Service ontology

• Simple ISA hierarchy.

• Primitive types include, but it can be modified:– Analysis

– Parsing

– Registration

– Retrieval

– Resolution

– Conversion

– Rendering

Page 28: BioMOBY Services

2/20/2007 BioMOBY Services 28

Service ontology

Service

Blast

NCBI_Blast

WU_Blast

Parse_NCBI_Blast

Parsing

AlignmentAnalysis

Page 29: BioMOBY Services

2/20/2007 BioMOBY Services 29

Outline

• The problem• The BioMOBY idea• BioMOBY ontologies• How BioMOBY works• Message exchanges• BioMOBY elements

Page 30: BioMOBY Services

2/20/2007 BioMOBY Services 30

MobyCentral

ServiceClient

ServiceProvider

Internet

How BioMOBY works

1) Service development

2) Service publication3) Service discovery

4) Service request

5) Service response

Technologically, BioMOBY services are general Web Services

Page 31: BioMOBY Services

2/20/2007 BioMOBY Services 31

How BioMOBY works

• BioMOBY defines a new layer on the protocol stack in order to work with its ontology.

• BioMOBY has its own messaging structure for registration, detection and invocation of services

TCP / IP

HTTP

SOAP

Moby

Most common protocol stack in Moby

WSDL

Service Publication

Bio-service Publication

Network

XML Message

Service Description

Biological Data

Moby

UDDI

Service Discovery

Bio-service Discovery Moby

UDDI

Page 32: BioMOBY Services

2/20/2007 BioMOBY Services 32

Outline

• The problem• The BioMOBY idea• BioMOBY ontologies• How BioMOBY works• Message exchanges• BioMOBY elements

Page 33: BioMOBY Services

2/20/2007 BioMOBY Services 33

Client-Provider interaction

Primary articles (simples / collections)

Secondary articlesPrimary articles (simples / collections)

BioMOBY service request N

Primary articles (simples / collections)

Secondary articlesPrimary articles (simples / collections)

BioMOBY service request 1

TCP / IP

HTTP

SOAP

Moby

WSDL

Network

XML Message

Service Description

Biological Data

1 input parameter containing the full XML BioMOBY input

1 output parameter containing the full XML BioMOBY output

Primary articles (simples / collections)

Secondary articlesPrimary articles (simples / collections)

BioMOBY service request 0

Page 34: BioMOBY Services

2/20/2007 BioMOBY Services 34

Client → Provider messages

<?xml version="1.0" encoding="UTF-8"?><MOBY xmlns="http://www.biomoby.org/moby">

<mobyContent>

<mobyData queryID=‘0'><!– Primary/Secondary articles -->

</mobyData><mobyData queryID=“1">

<!– Primary/Secondary articles --></mobyData>…<mobyData queryID=“N">

<!– Primary/Secondary articles --></mobyData>

</mobyContent></MOBY>

SEVERAL SERVICE REQUESTS INTO ONE INVOCATION

BioMOBY service request 0

BioMOBY service request 1

BioMOBY service request N

Page 35: BioMOBY Services

2/20/2007 BioMOBY Services 35

Provider → Client messages

<?xml version="1.0" encoding="UTF-8"?><MOBY xmlns="http://www.biomoby.org/moby">

<mobyContent>

<mobyData queryID=‘0'><!– Primary articles -->

</mobyData><mobyData queryID=“1">

<!– Primary articles --></mobyData>…<mobyData queryID=“N">

<!– Primary articles --></mobyData>

</mobyContent></MOBY>

SEVERAL SERVICE RESPONSES INTO ONE INVOCATION RESPONSE

BioMOBY service response 0

BioMOBY service response 1

BioMOBY service response N

Page 36: BioMOBY Services

2/20/2007 BioMOBY Services 36

Elemental requests/responses

<mobyData queryID=‘0'><Simple articleName=“in_or_out_data_name_0”>

<!– object from the ontology --></Simple>…<Collection articleName=“in_or_out_data_name_1”>

<Simple><!– object from the ontology -->

</Simple>…

</Collection>…<Parameter articleName=“in_param_name_0”>

<Value>param_value</Value></Parameter>…

</mobyData>

Page 37: BioMOBY Services

2/20/2007 BioMOBY Services 37

Global service information

• Global service information block: serviceNotes

<?xml version="1.0" encoding="UTF-8"?><MOBY xmlns="http://www.biomoby.org/moby">

<mobyContent>

<serviceNotes><Notes>Free text Service Notes</Notes>

<serviceNotes>…

</mobyContent></MOBY>

Page 38: BioMOBY Services

2/20/2007 BioMOBY Services 38

Error handling

• Extension of the global service information block (serviceNotes)

<serviceNotes>

<mobyException severity=“” refQueryID=“” refElement=“”><exceptionCode>code</exceptionCode><exceptionMessage>message</exceptionMessage>

</mobyException>

<Notes>Free text Service Notes</Notes></serviceNotes>

error: fatal error in the servicewarning: service detects an error or potential problem but continuesinformation: non erroneous informative message

(optional) refers to the queryID of the offending input mobyData

(optional) refers to the article of the offending input simple or collection

Page 39: BioMOBY Services

2/20/2007 BioMOBY Services 39

Error handling: example response

<?xml version="1.0" encoding="UTF-8"?><MOBY xmlns="http://www.biomoby.org/moby"><mobyContent>

<serviceNotes>

<mobyException refElement="“ refQueryID="1“ severity ="error"><exceptionCode>600</exceptionCode><exceptionMessage>Unable to execute the service</exceptionMessage>

</mobyException>

<Notes>Free text Service Notes</Notes></serviceNotes><mobyData queryID="1“ />

</mobyContent></MOBY>

Page 40: BioMOBY Services

2/20/2007 BioMOBY Services 40

Outline

• The problem• The BioMOBY idea• BioMOBY ontologies• How BioMOBY works• Message exchanges• BioMOBY elements

– MOBY-Central

– Client side

– Server side

Page 41: BioMOBY Services

2/20/2007 BioMOBY Services 41

BioMOBY Elements

ServiceClient

ServiceProvider

Internet

MobyCentral

Page 42: BioMOBY Services

2/20/2007 BioMOBY Services 42

Worldwide Distribution of MOBY Services

The Registry: Moby Central

ServiceClient

ServiceProvider

Internet

MobyCentral

• Moby project provides Moby Central as a Perl server

• It is a directory of services, datatypes and how to locate them

Page 43: BioMOBY Services

2/20/2007 BioMOBY Services 43

Client Side

MobyCentral

ServiceProvider

Internet

ServiceClient

• There are different kind of clients

• Some of them allow the creation of workflows

Programmatic

libraries:

Page 44: BioMOBY Services

2/20/2007 BioMOBY Services 44

Client Side: MOWServ

• Web browser based client

• Discovery of services based on data type ontology or on service type ontology

• It allows to connect easily service outputs to service inputs

• Interface helps to the Moby object construction

Page 45: BioMOBY Services

2/20/2007 BioMOBY Services 45

Client Side: MOWServ

Data types and service ontologies

Page 46: BioMOBY Services

2/20/2007 BioMOBY Services 46

Client Side: MOWServ

1) Ontology browsing & service selection

2) Input submission

3) Selection output name

4) Service submission

5) Check execution status

6) Check results

Page 47: BioMOBY Services

2/20/2007 BioMOBY Services 47

Client Side: MOWServ

List of available services for this datatype object

Integrated HTML visualizer

Raw XML visualizer

Download MOBY object

Page 48: BioMOBY Services

2/20/2007 BioMOBY Services 48

Client Side: Taverna

• Java based graphical integrated workbench

• It allows the construction of complex distributed workflows

• It can handle different kind of services (Moby and others)

Page 49: BioMOBY Services

2/20/2007 BioMOBY Services 49

Client Side: Taverna

Processors = Webservices

Inputs

Outputs

Page 50: BioMOBY Services

2/20/2007 BioMOBY Services 50

Client Side: Dashboard

1) Select client execution tab

2) Select service to execute

3) Fill up input4) Execute service

5) Check output

Page 51: BioMOBY Services

2/20/2007 BioMOBY Services 51

Client comparison

Taverna MOWServ Dashboard

Easy to build workflows Hard to build workflows No workflow support

Discovery of services based on providers

Discovery of services based on ontology

Discovery of services based on ontology

Secondary inputs cannot be modified

Secondary inputs can be modified

Secodary inputs can be modified

Java program Web browser access Java program

Page 52: BioMOBY Services

2/20/2007 BioMOBY Services 52

Server Side

MobyCentral

ServiceClient

Internet

ServiceProvider

• Moby provides libraries for easier service development in different platforms & languages (Perl & Java)

• These libraries provide an abstraction of the underlayer protocols. The developer does not need to handle internet connections or SOAP messages and he can concentrate on the biological problem

Page 53: BioMOBY Services

2/20/2007 BioMOBY Services 53

Server Side:Steps for Developing MOBY services

• Design the MOBY Objects for the inputs/outputs of your service.• Register them if they don’t exist.

• Choose the MOBY Service Type for your service.• Register it if it doesn’t exist.

• Choose the MOBY Namespaces that will use your service.• Register them if they don’t exist.

• Construct your MOBY Service.

• Register your MOBY Service.

• Test your MOBY Service as a client (discover and execute it).

Page 54: BioMOBY Services

Workflows over Grid-based Web servicesGeneral framework and a practical case in structural biology

References

Page 55: BioMOBY Services

2/20/2007 BioMOBY Services 55

References

• BioMOBY homepage:– http://www.biomoby.org/– All the tools and libraries downloadable via CVS

• Tutorial on INB Technologies(Msc on Bioinformatics for Health Sciences – Universitat Pompeu Fabra)

– http://genome.imim.es/courses/INB2006/index.html

• PlaNet Workshop:– http://mips.gsf.de/projects/plants/PlaNetPortal/workshop/index.html

• Taverna:– http://taverna.sourceforge.net/

• MOWServ:– http://www.inab.org/MOWServ/

• Dashboard (as part of jMoby):– http://biomoby.open-bio.org/CVS_CONTENT/moby-live/Java/docs/