22
SIMDAT SIMDAT IXodus a knowledge discovery process based on the SIMDAT-Pharma GRID technologies Richard Kamuzinzi Université Libre de Bruxelles – Bioinformatics June, 5 – 7th 2007 World Wide Workflow GRID ASIA 2007 Singapore

IXodus a knowledge discovery process based on the SIMDAT-Pharma GRID technologies Richard Kamuzinzi Université Libre de Bruxelles – Bioinformatics June,

Embed Size (px)

Citation preview

Page 1: IXodus a knowledge discovery process based on the SIMDAT-Pharma GRID technologies Richard Kamuzinzi Université Libre de Bruxelles – Bioinformatics June,

SIMDAT SIMDAT

IXodusa knowledge discovery process based

on the SIMDAT-Pharma GRID technologies

Richard KamuzinziUniversité Libre de Bruxelles – Bioinformatics

June, 5 – 7th 2007 World Wide Workflow GRID ASIA 2007

Singapore

Page 2: IXodus a knowledge discovery process based on the SIMDAT-Pharma GRID technologies Richard Kamuzinzi Université Libre de Bruxelles – Bioinformatics June,

SIMDAT SIMDAT

SIMDAT Facts

• EU Information Society Technologies (IST)• GRID Project• Duration: 4 years

• Start date: September 1st 2004• 26 partners

Page 3: IXodus a knowledge discovery process based on the SIMDAT-Pharma GRID technologies Richard Kamuzinzi Université Libre de Bruxelles – Bioinformatics June,

SIMDAT SIMDAT

Scope

• Product and Process Development (automobiles, aircraft, drugs, meteorological services) is – Complex– Involves several independent

organizations at different locations

• Complexity management in one site is too expensive => cost/risk sharing with partners => GRID

Page 4: IXodus a knowledge discovery process based on the SIMDAT-Pharma GRID technologies Richard Kamuzinzi Université Libre de Bruxelles – Bioinformatics June,

SIMDAT SIMDAT

Strategic objectives

• to test and enhance Data Grid technology for product development and production process design,

• to develop federated versions of problem-solving environments by leveraging enhanced Grid services,

• to exploit Data Grids as a basis for distributed knowledge discovery,

• to promote defacto standards for these enhanced Grid technologies across a range of disciplines and sectors as well as

• to raise awareness of the advantages of Data Grids in important industrial sectors

Page 5: IXodus a knowledge discovery process based on the SIMDAT-Pharma GRID technologies Richard Kamuzinzi Université Libre de Bruxelles – Bioinformatics June,

SIMDAT SIMDAT

Project organization (SIMDAT-Pharma)

NEC, GSK, Inpharmatica, ULB, Fraunhofer SCAI-Bio and UKA

Page 6: IXodus a knowledge discovery process based on the SIMDAT-Pharma GRID technologies Richard Kamuzinzi Université Libre de Bruxelles – Bioinformatics June,

SIMDAT SIMDAT

IXodus – The scientific problem

• Lyme disease: significant source of human and animal pathology in temperate areas of the world (identified in 90s)

• Caused by the bite of a tick of genus IXodes, infected by the pathogen bacterium Borrelia burgdorferi

• the study of host-parasite interactions is an active research as ~20% ticks have been found infected by the bacterium

• IXodus scientific protocol: designed to deal with characterisations of genes expressed in the salivary gland of the tick IXodes ricinus at various stage of the host-parasite interaction process

Page 7: IXodus a knowledge discovery process based on the SIMDAT-Pharma GRID technologies Richard Kamuzinzi Université Libre de Bruxelles – Bioinformatics June,

SIMDAT SIMDAT

IXodus – Workflow design (1)

• From IXodus scientific protocol to IXodus workflow (WF) design, we identify 2 uses cases:

1. “New cDNA sequences”: the workflow is daily feeded with a batch of nucleic sequences from the systematic sequencing of thousands of salivary gland cDNAs

2. “Databank update”: whenever a new version of relevant biological databank appears, the core workflow analysis is re-enacted to discover potentially new information

Page 8: IXodus a knowledge discovery process based on the SIMDAT-Pharma GRID technologies Richard Kamuzinzi Université Libre de Bruxelles – Bioinformatics June,

SIMDAT SIMDAT

P ro v id e n e wc D N A

<<d a ta s to re >>IX -o d u s

c D N A S e q u e n c e

IX -o d u s S e q u e n c e s

Co mp a re w ithIX -o d u s D B

Bu ild n e w v irtu a ls e q u e n c e

[s imila r A N D (e xa c t p a rt )]

A

B

c D N A S e q u e n c e

D B A n n o ta teg ro u p me mb e rs h ip

<<d a ta s to re >>IX -o d u s

IX -o d u s Co mp Re s u lt

A

[s imila r A N D N O T (e xa c t p a rt )]

ma keBla s tN

[e ls e ]

A B

<<d a ta s to re >>EM BLA

EM BLS e q u e n c e s

Bla s tN Re s u lt

D B A n n o ta te" s u c c e s s "

[s imila r]

ma keBla s tX

<<d a ta s to re >>U N IP RO T/GEN P EP T

U GS e q u e n c e s

[e ls e ] B la s tX Re s u lt

[s imila r]

ma keT Bla s tX

C

A C

EM BLS e q u e n c e s

EM BLS e q u e n c e s

T Bla s tX Re s u lt

A n a ly s is b yd o ma in e xp e rt

[s imila r]

<<a n a ly s is _ kin d >>O RF fin d e r

A

[e ls e ]

[e ls e ]

O RF in d Re s u lt

D B A n n o ta te" p o te n t ia l n e w "

[e ls e ]

[fo u n d ] <<A n a ly s is _ kin d >>M o t if s e a rc h

M S Re s u lt

[e ls e ]

[fo u n d ]

D B A n n o ta te" mo t if fo u n d "

<<d a ta s to re >>IN T ERP RO

In te rp ro S e q u e n c e s

<<d a ta s to re >>IX -o d u s

v irtu a lS e q u e n c e

me mb e rs h ip

A

<<d a ta s to re >>IX -o d u s

S u c c e s s A n n o ta t io n

P o te n t ia lN e w A n n o ta t io n

M o t ifF o u n d A n n o ta t io n

AI X-odus UML 2.0activity diagram

Use case: "New cDNAsequences"

Scie

nti

st

SY

ST

EM

IXodus design (2) Use Case 1

Sequences

Gathering

part

Pre-processing

part

Main analysis

part

Page 9: IXodus a knowledge discovery process based on the SIMDAT-Pharma GRID technologies Richard Kamuzinzi Université Libre de Bruxelles – Bioinformatics June,

SIMDAT SIMDAT

IXodus design (3) Use Case 2

ma keBla s tN

<<d a ta s to re >>EM BL

EM BLS e q u e n c e s

Bla s tN Re s u lt

D B A n n o ta te" s u c c e s s "

[s imila r]

ma keBla s tX

<<d a ta s to re >>U N IP RO T/GEN P EP T

U GS e q u e n c e s

[e ls e ] Bla s tX Re s u lt

[s imila r]

ma ke T Bla s tX

B

A BEM BLS e q u e n c e s

EM BLS e q u e n c e s

T Bla s tX Re s u lt

A n a ly s is b yd o ma in e xp e rt

[s imila r]

<<a n a ly s is _ kin d >>O RF fin d e r

A

[e ls e ]

[e ls e ]

O RF in d Re s u lt

D B A n n o ta te" p o te n t ia l n e w "

[e ls e ]

[fo u n d ] <<A n a ly s is _ kin d >>M o t if s e a rc h

M S Re s u lt

[e ls e ]

[fo u n d ]

D B A n n o ta te" mo t if fo u n d "

<<d a ta s to re >>IN T ERP RO

In te rp ro S e q u e n c e s

<<d a ta s to re >>IX -o d u s

A

<<d a ta s to re >>IX -o d u s

S u c c e s s A n n o ta t io n

P o te n t ia lN e w A n n o ta t io n

M o t ifF o u n d A n n o ta t io n

A

I X-odus UML 2.0activity diagram

Use case: "Databank update "

Scie

nti

st

SY

ST

EM

a fte r o n e w e e k

U p d a teD a ta b a n k

SY

ST

EM

Ad

min

istr

ato

r

<<d a ta s to re >>IX -o d u s

IX -o d u s S e q u e n c e s

A

S e n d T Bla s tX n o t ific a t io n

Re c e iv e n o t fic a t io n

Event processing

part

Page 10: IXodus a knowledge discovery process based on the SIMDAT-Pharma GRID technologies Richard Kamuzinzi Université Libre de Bruxelles – Bioinformatics June,

SIMDAT SIMDAT

IXodus – Implementation

• Workflow technology platform: InforSenseTM KDE

• Implementation is tightly coupled with the deployment environment, which is mainly driven by 2 kind of constraints:– GRID approach– Semantic Web (SW) approach

Page 11: IXodus a knowledge discovery process based on the SIMDAT-Pharma GRID technologies Richard Kamuzinzi Université Libre de Bruxelles – Bioinformatics June,

SIMDAT SIMDAT

IXodus implementation - The test-bed GRID approach

Knowledge DB IXodus

G R IA

N o D ynA

E M B O SS &

B L AST

To o l s

M R S W e b Se r vi c e

W rap p e rs

E 2 E S e cS e rve r

G R IA

N o D ynA

E M B O SS &

B L AST

To o l s

M R S W e b Se r vi c e

W rappe rs

E 2 E S e cS e rve r

G R IA

N o D ynA

E M B O SS &

B L AST

To o l s

M R S W e b Se r vi c e

W rappe rs

E 2 E S e cS e rve r

Info rS e ns e K D E

IP R SC AN B i o To o l s

<<P lugin>>B i o Se ns e

<<P lugin>>Se m ant i c

B r o ke r

<<P lugin>>G R IA

E 2 E S e c C lie nt

G R IAC lie nt

S e m a ntic e na ble d s e rv ic e d is c o v e ry

Se m ant i c e nabl e d

s e rv ic e p u b lic a tio n

O W L _ D L R e a s o ning

Internet

EMBL -services

ULB

NEC – Semantic Broker

ULB -services EMBL - services

Main properties Federated data and services with redundancy Privacy, AuthZ, AuthN, non

repudiation Intellectual Proprietary (IPR)

preservation by traceability(digital signatures)

Users profiles management to optimise resources availability

Page 12: IXodus a knowledge discovery process based on the SIMDAT-Pharma GRID technologies Richard Kamuzinzi Université Libre de Bruxelles – Bioinformatics June,

SIMDAT SIMDAT

G R IA

N o D ynA

E M B O SS &

B L AST

To o l s

M R S W e b Se r vi c e

W rap p e rs

E 2 E S e cS e rve r

G R IA

N o D ynA

E M B O SS &

B L AST

To o l s

M R S W e b Se r vi c e

W rappe rs

E 2 E S e cS e rve r

G R IA

N o D ynA

E M B O SS &

B L AST

To o l s

M R S W e b Se r vi c e

W rappe rs

E 2 E S e cS e rve r

Info rS e ns e K D E

IP R SC AN B i o To o l s

<<P lugin>>B i o Se ns e

<<P lugin>>Se m ant i c

B r o ke r

<<P lugin>>G R IA

E 2 E S e c C lie nt

G R IAC lie nt

S e m a ntic e na ble d s e rv ic e d is c o v e ry

Se m ant i c e nabl e d

s e rv ic e p u b lic a tio n

O W L _ D L R e a s o ning

Internet

ULB

ULBEMBL NEC

NEC

IXodus implementation - The test-bed SW approach

Main properties Semantic-enabled service

annotation Semantic-enabled service

discovery “Which service instance

can operate on the latest version of the EMBL databank?”

Dynamic update of already annotated services

Service advertising

Semantic Broker

Page 13: IXodus a knowledge discovery process based on the SIMDAT-Pharma GRID technologies Richard Kamuzinzi Université Libre de Bruxelles – Bioinformatics June,

SIMDAT SIMDAT

IXodus implementation – InforSense KDE The complete Workflow

Page 14: IXodus a knowledge discovery process based on the SIMDAT-Pharma GRID technologies Richard Kamuzinzi Université Libre de Bruxelles – Bioinformatics June,

SIMDAT SIMDAT

IXodus implementation – InforSense KDE User sequences gathering

Page 15: IXodus a knowledge discovery process based on the SIMDAT-Pharma GRID technologies Richard Kamuzinzi Université Libre de Bruxelles – Bioinformatics June,

SIMDAT SIMDAT

IXodus implementation – InforSense KDE Management of sequences overlapping

Page 16: IXodus a knowledge discovery process based on the SIMDAT-Pharma GRID technologies Richard Kamuzinzi Université Libre de Bruxelles – Bioinformatics June,

SIMDAT SIMDAT

IXodus implementation – InforSense KDE Main analysis flow (Bioinformatics tools)

Page 17: IXodus a knowledge discovery process based on the SIMDAT-Pharma GRID technologies Richard Kamuzinzi Université Libre de Bruxelles – Bioinformatics June,

SIMDAT SIMDAT

IXodus implementation – InforSense KDE Service instance selection & launching

Page 18: IXodus a knowledge discovery process based on the SIMDAT-Pharma GRID technologies Richard Kamuzinzi Université Libre de Bruxelles – Bioinformatics June,

SIMDAT SIMDAT

IXodus - General benefits

• Workflow tool maturity: design of complex WF to support demanding problem in a reasonable delivery-time is a reality (RWD vs. RAD)

• WF on GRID approach is really valuable and provides the confidence we need to front the data/services “tsunami” in Life sciences… the good news is …

Page 19: IXodus a knowledge discovery process based on the SIMDAT-Pharma GRID technologies Richard Kamuzinzi Université Libre de Bruxelles – Bioinformatics June,

SIMDAT SIMDAT

IXodus - General benefits (2)

...thanks to WF technologies, the scientists no more scares the vertiginous “beast” (data/services explosion)…

Page 20: IXodus a knowledge discovery process based on the SIMDAT-Pharma GRID technologies Richard Kamuzinzi Université Libre de Bruxelles – Bioinformatics June,

SIMDAT SIMDAT

IXodus – Remaining challenges

• B2A Grids: we still need precise understanding of strategic benefits from both (“win-win”) side

• WF technologies: need better distinction between “abstract” WF and “operational” WF: – How to decouple?– Runtime service selection using the concept of rules?

• At design phase: the designer would appreciate semantics approach to search for services

• From WF to Service: – Partial (∑args) vs. Complete(∑args)– Different profiles of user

• From WF to UI:– At design phase: need to define how WF actors interact with

the whole system• To leverage the WF log in order to generate textual information

that would support scientific papers/notebooks writing (who, service_name, service_version, database_version, …)

Page 21: IXodus a knowledge discovery process based on the SIMDAT-Pharma GRID technologies Richard Kamuzinzi Université Libre de Bruxelles – Bioinformatics June,

SIMDAT SIMDAT

SIMDAT- Major outcomes to expect

SIMDAT approach will provide state-of-the-art components

• To enable industry-strength environment for e-Science activities

• To support the academia/industry collaborations in R&D activities (B2B & B2A Grids)– B2A Grids: how the “win-win” model is

precisely configured?• To help build up virtual organisations that

federate data, services and scientific expertise

Page 22: IXodus a knowledge discovery process based on the SIMDAT-Pharma GRID technologies Richard Kamuzinzi Université Libre de Bruxelles – Bioinformatics June,

SIMDAT SIMDAT

Thank you !

Web: http://www.simdat.org

Contact: [email protected]

Acknowledgments

co-author: Robert Herzog, Université Libre de Bruxelles (ULB)

Scientific expert: Valérie Ledent, ULB

Edmond Godfroid & Bernard Couvreur: Laboratory of Applied Genetics, ULB

SIMDAT colleagues: Joseph Mavor (ULB), Falk Zimmermann (NEC), Changtao Qu (NEC), Nabeel Azam (InforSense), Moustapha Ghanem (InforSense), Kai Kumpf (SCAI-Bio)