43
GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams- Beuren Syndrome Professor Carole Goble http://www.mygrid.org.uk

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome

Embed Size (px)

Citation preview

Page 1: GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004

Exploring Williams-Beuren Syndrome

Professor Carole Goble

http://www.mygrid.org.uk

Page 2: GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004

AcknowledgementsmyGrid is an EPSRC funded UK eScience Program Pilot Project

Particular thanks to the other members of the Taverna project, http://taverna.sf.net

Page 3: GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004

Roadmap• myGrid in a nutshell• Gene characterisation in Williams-Beuren Syndrome.• Semantic Aspects

– Information model– Service discovery – Data Management - LSID– Metadata management for provenance – RDF

• Lessons learnt and opportunities

Page 4: GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004

In a nutshell

• Bioinformatics toolkit• Open (Web) Services

– myGrid components– External domain services– No control or influence over service

providers

• Open to third party metadata • Open extensible architecture

– Assemble your own components– Designed to work together– Toolkit– Axis/Apache based– RDF and DAML+OIL/OWL– Jena, OilEd, Instance Store & FaCT

Freefluo

WfEE

TavernaWfDE

ViewUDDIregistry

EventNotification

mIR

PedroSemanticDiscovery

Info.Model

SoaplabGowlab

Gateway & CHEFPortal

LSID

HaystackProvenanceBrowser

Page 5: GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004

Williams-Beuren Syndrome

• Microdeletion of 155 Mbases on Chromosome 7• Hannah Tipney, May Tassabehji, Andy Brass, St Mary’s

Hospital, Manchester, UK• Characterise an unknown gene• Annotation pipelines and Gene expression analysis

Services from USA, Japan, various sites in UK

Page 6: GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004

Williams-Beuren Syndrome Microdeletion

**

Chr 7 ~155 Mb

~1.5 Mb7q11.23

GTF2I

RFC2

CYLN2

GTF2IRD1

NCF1

WBSCR1/E1f4H

LIM

K1

ELN

CLDN4

CLDN3

STX1A

WBSCR18

WBSCR21

TBL2

BCL7B

BAZ1B

FZD9

WBSCR5/LAB

WBSCR22

FKBP6

POM121

NOLR1

GTF2IRD2

C-c

en

C-m

id

A-c

en

B-m

id

B-c

en

A-m

id

B-t

el

A-t

el

C-t

el

WBSCR14

WBS

SVAS

ST

AG

3P

MS

2L

Block A

FK

BP

6T

PO

M12

1N

OL

R1

Block C

GT

F2I

P

NC

F1P

GT

F2I

RD

2P

Block B

Patient deletions

CTA-315H11

CTB-51J22

Gap

Physical Map

Page 7: GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004

Filling a genomic gap

Two major steps:• Extend into the gap: Similarity searches; RepeatMasker, BLAST• Characterise the new sequence: NIX, Interpro, etc…

• Numerous web-based services (i.e. BLAST, RepeatMasker)• Cutting and pasting between screens• Large number of steps• Frequently repeated – info now rapidly added to public databases• Don’t always get results• Time consuming• Huge amount of interrelated data is produced – handled in lab book and

files saved to local hard drive• Mundane• Much knowledge remains undocumented• Bioinformatician does the analysis

Page 8: GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004

Point, click, cut, paste

ID MURA_BACSU STANDARD; PRT; 429 AA.DE PROBABLE UDP-N-ACETYLGLUCOSAMINE 1-CARBOXYVINYLTRANSFERASEDE (EC 2.5.1.7) (ENOYLPYRUVATE TRANSFERASE) (UDP-N-ACETYLGLUCOSAMINEDE ENOLPYRUVYL TRANSFERASE) (EPT).GN MURA OR MURZ.OS BACILLUS SUBTILIS.OC BACTERIA; FIRMICUTES; BACILLUS/CLOSTRIDIUM GROUP; BACILLACEAE;OC BACILLUS.KW PEPTIDOGLYCAN SYNTHESIS; CELL WALL; TRANSFERASE.FT ACT_SITE 116 116 BINDS PEP (BY SIMILARITY).FT CONFLICT 374 374 S -> A (IN REF. 3).SQ SEQUENCE 429 AA; 46016 MW; 02018C5C CRC32; MEKLNIAGGD SLNGTVHISG AKNSAVALIP ATILANSEVT IEGLPEISDI ETLRDLLKEI GGNVHFENGE MVVDPTSMIS MPLPNGKVKK LRASYYLMGA MLGRFKQAVI GLPGGCHLGP RPIDQHIKGF EALGAEVTNE QGAIYLRAER LRGARIYLDV VSVGATINIM LAAVLAEGKT IIENAAKEPE IIDVATLLTS MGAKIKGAGT NVIRIDGVKE LHGCKHTIIP DRIEAGTFMI

Page 9: GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004

WBS Workflows:

GenBank Accession No

GenBank Entry

Seqret

Nucleotide seq (Fasta)

GenScanCoding sequence

ORFs

prettyseq

restrict

cpgreport

RepeatMasker

ncbiBlastWrapper

sixpack

transeq

6 ORFs

Restriction enzyme map

CpG Island locations and %

Repetative elements

Translation/sequence file. Good for records and publications

Blastn Vs nr, est databases.

Amino Acid translation

epestfind

pepcoil

pepstats

pscan

Identifies PEST seq

Identifies FingerPRINTS

MW, length, charge, pI, etc

Predicts Coiled-coil regions

SignalPTargetPPSORTII

InterProPFAMPrositeSmart

Hydrophobic regions

Predicts cellular location

Identifies functional and structural domains/motifs

Pepwindow?Octanol?

ncbiBlastWrapper

URL inc GB identifier

tblastn Vs nr, est, est_mouse, est_human databases.Blastp Vs nr

RepeatMasker

Query nucleotide sequence ncbiBlastWrapper

Sort for appropriate Sequences only

Pink: Outputs/inputs of a servicePurple: Taylor-made servicesGreen: Emboss soaplab services Yellow: Manchester soaplab services Grey: Unknowns

RepeatMasker

Page 10: GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004

Collections of Tasks

Finding

DescriptionService

Discovery

Enactment

BuildingWorkflow

Provenance

Storage

DataManagement

Querying

DomainTasks Service

Providers

Bioinformaticians

Scientists

Annotation providers

Page 11: GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004

Registry

mIR

Feta

HaystackProvenance

Browser

FreeFluoWfEE

TavernaWfDE

PedroAnnotation tool

Ontology Store

Others

WSDLSoap-lab

Interface Description

Annotation/description

Annotation providers

Query &Retrieve Workflow

Execution

Store data/knowledge

Scientists

Bioinformaticians

invoking

Querying/sharing/federating/registering

ServiceProviders

Data descriptions

Vocabulary

Page 12: GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004

WBS task• Wrap services as web

services• Register them• Build a workflow using the

services• Evolve the workflow• Run it over and over again in

case data has changed• Record results & provenance• Inspect and compare results

& provenance• Event notification, portal, 3rd

party annotation…

Page 13: GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004

Page 14: GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004

User Results

Benchmark: Two iterations of workflows (1 day run)– Reduced gap by 267 693 bp at its centrmeric end– Correctly located all seven known genes in this region– Identified 33 of the 36 known exons residing in this

location

Manually: takes two days (+) including analysis

Now: takes 30 mins to produce results and half a day for analysis.

• Less boring. Less prone to mistakes.• Once notification installed won’t even have to

initiate it.

Page 15: GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004

Where is the semantics

Page 16: GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004

Information Model v2• Resources and Identifiers

• People, teams and organizations• Representing the e-science

process• Experimental methods for e-

science

1..*0..* uses

1

0..*

contains

10..*

selected studies

0..*1

method

0..*

0..*

acts in

10..*

labBooks

scmInvestigator

1 0..*has participants 10..* participates in

0..*

1

uses

method

1 0..*has instances

AgentExperimentInstance

LabBookView

+name:String+rule:String

SubjectObject

Resources.Resource

+getId:URIString

ProgrammeResource

+name:String

<<Resource>>Study

+name:String+description:String+startTime:DateTime+endTime:DateTime+status:String

Programme

<<Resource>>Operations.Operation

<<Resource>>ExperimentDesign

Investigation

<<Resource>>PeopleAndTeams.Person

StudyRole

+roleName:String+description:String

Agent<<Resource>>

StudyParticipation

• Scientific data and the life-science identifier– Types– Identifier Types– Values and Documents

• Provenance information• Annotation and Argumentation

XML messages between services conform to the IMv2

Page 17: GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004

Semantic discovery• The User does the

choosing of services• A common ontology is

used to annotate and query any myGrid object including services.

• Ontology is built using DAML+OIL and reasoning

• Deployed as a static RDF graph

• Discover workflows and services described in the registry via Taverna.

• Look for all workflows that accept an input of semantic type nucleotide sequence.

Page 18: GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004

Role of Ontologies

Composing and validating workflows and service compositions & negotiations

Describing & Linking Provenance records

Change & event Notification topics

Ontologies

Resource annotations

Service & resource registration & discovery

Schema mediation

Controlling contents of metadata and dataKnowledge-based guidance

and recommendation

Service matching and provisioning

Help

Page 19: GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004

Observations

Page 20: GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004

Services

• Practically all the services are remote and third party• Services are changeable and unreliable• Redundant services are essential• WSDL in the wild is poor• Automated annotation

http://pedro.man.ac.uk

Page 21: GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004

Can you guess what it is yet?

Page 22: GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004

operation

name, descriptioninputoutputtaskmethodresourceapplication

workflow

bioMoby service

WSDL operation

Soaplab service

service

name, descriptionauthororganisation

WSDL service

parameter

name, descriptionsemantic typeformattransport typecollection typecollection format

Model of services

Page 23: GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004

SHIM Services

Main Bioinformatics Applications

Main Bioinformatics

Services

Main Bioinformatics

Application

Main Bioinformatics

Application

SHIM Services

• Services that enable domain services to fit together

• Outnumber domain services

• Libraries• Candidates for

automatic selection, composition and substitution

Page 24: GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004

Results management

• Automated workflows produce lots of heterogeneous data

• These are just some of the results from one workflow run for Williams Disease

Page 25: GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004

Amplification

One input

Many outputs

Page 26: GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004

• FreeFluo agnostic about the data flowing through it.

• Taverna includes a DataThing class, which can be tagged with terms from ontologies, free text descriptions and MIME types, and which may contain arbitrary collection structures.

• Using the metadata hints we can locate and launch pluggable view components.

• Hybrid typing scheme allows for a ‘best effort’ approach to data typing.

• Life science types are intractable for reasonable effort or completeness.

Dealing with results

Page 27: GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004

Intermediate Results

Page 28: GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004

Intermediate Results• Workflows

change the way the bioinformatican works

• Before: analyse results as go along

• After: all results in one go

• So linking intermediate results important

Page 29: GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004

Life Science IDs• LSID provides a uniform naming

scheme.• LSID Resolver guarantees to

resolve to same data object.• LSID Authority dishes them out.• Also returns metadata of object.• Used throughout myGrid as an

object naming device.• myGrid Repository acts an LSID

Authority• LSID allows universal access to

results for collaboration, as well as for review.

• RDF+LSID explains the context of results, and provides guidance for further investigations.

Pioneered by myGrid

I3C / IBM / EBI proposal for a Life Science Identifier

http://www.i3c.org/wgr/ta/resources/lsid/docs/

Page 30: GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004

Process Provenance

Page 31: GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004

Link v Data Representation

• Data management questions refer to relationships rather than internal content– What are the origins of this data?

• Which service produced this data?• Which data is this derived from?• Who was this data produced for?• ?What is this data telling me?

• Data analysis questions delegated to external services.

Page 32: GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004

Representing links

• Identify each resource– Life science identifier: URI with associated data and

metadata retrieval protocols.– Understanding that underlying data will not change

urn:lsid:taverna.sf.net:datathing:45fg6 urn:lsid:taverna.sf.net:datathing:23ty3

Page 33: GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004

Representing links II

• Identify link type– Again use URI– Allows us to use RDF infrastructure

• Repositories• Ontologies

urn:lsid:taverna.sf.net:datathing:45fg6 urn:lsid:taverna.sf.net:datathing:23ty3

http://www.mygrid.org.uk/ontology#derived_from

Page 34: GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004

Workflow run

Workflow design

Experiment design

Project

Person

Organisation

Process

Service

Event

Data item

Data itemData item

data derivation e.g. output data derived from input data

knowledge statementse.g. similar protein sequence to

instanceOf

partOf componentProcesse.g. web service invocation of BLAST @ NCBI

componentEvente.g. completion of a web service invocation at 12.04pm

runBye.g. BLAST @ NCBI

run for

Organisation level provenance Process level provenance

Data/ knowledge level provenance

User can add templates to each workflow process to determine links between data items.

Page 35: GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004

19747251 AC005089.3831Homo sapiens BAC

clone CTA-315H11 from 7, complete sequence15145617 AC073846.6

815Homo sapiens BAC

clone RP11-622P13 from 7, complete sequence15384807 AL365366.20

46.1Human DNA sequence

from clone RP11-553N16 on chromosome 1, complete sequence7717376 AL163282.2

44.1Homo sapiens

chromosome 21 segment HS21C08216304790 AL133523.5

44.1Human chromosome 14

DNA sequence BAC R-775G15 of library RPCI-11 from chromosome 14 of Homo sapiens (Human), complete sequence34367431 BX648272.1

44.1Homo sapiens mRNA;

cDNA DKFZp686G08119 (from clone DKFZp686G08119)5629923 AC007298.17

44.1Homo sapiens 12q22

BAC RPCI11-256L6 (Roswell Park Cancer Institute Human BAC Library) complete sequence34533695 AK126986.1

44.1Homo sapiens cDNA

FLJ45040 fis, clone BRAWH302048620377057 AC069363.10

44.1Homo sapiens

chromosome 17, clone RP11-104J23, complete sequence4191263 AL031674.1

44.1Human DNA sequence

from clone RP4-715N11 on chromosome 20q13.1-13.2 Contains two putative novel genes, ESTs, STSs and GSSs, complete sequence17977487 AC093690.5

44.1Homo sapiens BAC

clone RP11-731I19 from 2, complete sequence17048246 AC012568.7

44.1Homo sapiens

chromosome 15, clone RP11-342M21, complete sequence14485328 AL355339.7

44.1Human DNA sequence

from clone RP11-461K13 on chromosome 10, complete sequence5757554 AC007074.2

44.1Homo sapiens PAC

clone RP3-368G6 from X, complete sequence4176355 AC005509.1

44.1Homo sapiens

chromosome 4 clone B200N5 map 4q25, complete sequence2829108 AF042090.1

44.1Homo sapiens

chromosome 21q22.3 PAC 171F15, complete sequence

>gi|19747251|gb|AC005089.3| Homo sapiens BAC clone CTA-315H11 from 7, complete sequenceAAGCTTTTCTGGCACTGTTTCCTTCTTCCTGATAACCAGAGAAGGAAAAGATCTCCATTTTACAGATGAGGAAACAGGCTCAGAGAGGTCAAGGCTCTGGCTCAAGGTCACACAGCCTGGGAACGGCAAAGCTGATATTCAAACCCAAGCATCTTGGCTCCAAAGCCCTGGTTTCTGTTCCCACTACTGTCAGTGACCTTGGCAAGCCCTGTCCTCCTCCGGGCTTCACTCTGCACACCTGTAACCTGGGGTTAAATGGGCTCACCTGGACTGTTGAGCG

urn:lsid:taverna:datathing:15

..BLAST_Report

rdf:type

urn:lsid:taverna:datathing:13

..similar_sequences_to

.. nucleotide_sequence

rdf:type

service invocation

..created_by

workflow invocation

workflow definition

experiment definition

project

person

group

service description

organisation

..described_by

..run_during

..invocation_of

..part_of

..works_for

..part_of

..part_of

..author

..author

..run_for

A B

..masked_sequence_of

..filtered_version_of

Relationship BLAST report has with other items in the repository

Other classes of information related to BLAST report

Provenance tracking• Automated generation of this web of links preferable

• Workflow enactor generates– LSIDs– Data derivation links– Knowledge links– Process links– Organisation links

Page 36: GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004

Storage

• LSID has no protocol for storage

• Taverna/ Freefluo implements its own data/ metadata storage protocol

Taverna/Freefluo

Metadata Store

Data store

Publish interface

data

metadata

Page 37: GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004

Retrieval

• LSID protocol used to retrieve data and metadata

• Query handled separately

Metadata Store

Data store

LSID interface

LSID aware client

Query

RDF aware client

Taverna/Freefluo

Metadata Store

Data store

Publish interface

data

metadata

Page 38: GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004

GenBank record

Portion of the Web of

provenance

Managing collection of

sequences for review

IBM’s BioHaystack

Page 39: GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004

Observations• Managed the transition from generic middleware

development to practical day to day useful services• Real users (plural) fundamental to that• End to end support for an entire scenario• Bury the semantics• Show stoppers for practical adoption are not technical

showstoppers– Can I incorporate my favourite service?– Can I manage the results?

• By tapping into (defacto) standards and communities we can leverage others results and tools – LSID, Haystack, Pedro.

Page 40: GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004

AcknowledgementsmyGrid is an EPSRC funded UK eScience Program Pilot Project

Particular thanks to the other members of the Taverna project, http://taverna.sf.net

Page 41: GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004

myGrid PeopleCore• Matthew Addis, Nedim Alpdemir, Tim Carver, Rich Cawley, Neil Davis,

Alvaro Fernandes, Justin Ferris, Robert Gaizaukaus, Kevin Glover, Carole Goble, Chris Greenhalgh, Mark Greenwood, Yikun Guo, Ananth Krishna, Peter Li, Phillip Lord, Darren Marvin, Simon Miles, Luc Moreau, Arijit Mukherjee, Tom Oinn, Juri Papay, Savas Parastatidis, Norman Paton, Terry Payne, Matthew Pockock Milena Radenkovic, Stefan Rennick-Egglestone, Peter Rice, Martin Senger, Nick Sharman, Robert Stevens, Victor Tan, Anil Wipat, Paul Watson and Chris Wroe.

Users• Simon Pearce and Claire Jennings, Institute of Human Genetics School of

Clinical Medical Sciences, University of Newcastle, UK• Hannah Tipney, May Tassabehji, Andy Brass, St Mary’s Hospital,

Manchester, UKPostgraduates• Martin Szomszor, Duncan Hull, Jun Zhao, Pinar Alper, John Dickman,

Keith Flanagan, Antoon Goderis, Tracy Craddock, Alastair HampshireIndustrial • Dennis Quan, Sean Martin, Michael Niemi, Syd Chapman (IBM)• Robin McEntire (GSK)Collaborators• Keith Decker

Page 42: GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004

http://www.mygrid.org.uk

Page 43: GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome

GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004

Summary• myGrid offers service based middleware components• Open source and freely downloadable• Open Grid Service Architecture-compliant• Allows the scientist to be at the centre of the Grid --

Personalisation• Generic middleware that suits the creation of

bioinformatics applications• Inclusion of rich semantics to facilitate the scientific

process

Available from http://www.mygrid.org.uk