36
Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006 Bioinformatics Ontology for Automatic Workflow Generation on Web/Grid Services Konagaya Akihiko Project Director Advanced Genome Information Technology Research Group RIKEN Genomic Sciences Center

Bioinformatics Ontology for Automatic Workflow Generation on Web/Grid Services

  • Upload
    gunnar

  • View
    45

  • Download
    0

Embed Size (px)

DESCRIPTION

Bioinformatics Ontology for Automatic Workflow Generation on Web/Grid Services . Konagaya Akihiko Project Director Advanced Genome Information Technology Research Group RIKEN Genomic Sciences Center. Contents. Role of Ontology Web Services for Bioinformatics Automatics Workflow Generation - PowerPoint PPT Presentation

Citation preview

Page 1: Bioinformatics Ontology for Automatic Workflow Generation  on Web/Grid Services

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Bioinformatics Ontology for Automatic Workflow Generation

on Web/Grid Services Konagaya Akihiko

Project DirectorAdvanced Genome Information Technology

Research GroupRIKEN Genomic Sciences Center

Page 2: Bioinformatics Ontology for Automatic Workflow Generation  on Web/Grid Services

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Contents

• Role of Ontology• Web Services for Bioinformatics• Automatics Workflow Generation• Lessons from our First Experience

Page 3: Bioinformatics Ontology for Automatic Workflow Generation  on Web/Grid Services

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Role of Ontology

Page 4: Bioinformatics Ontology for Automatic Workflow Generation  on Web/Grid Services

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Tacit and Explicit Knowledge

Michael Polanyi (1891-1976)    

We should start from the fact that 'we can know more than we can tell'.

Michael Polanyi, “The Tacit Dimension” 1967

Page 5: Bioinformatics Ontology for Automatic Workflow Generation  on Web/Grid Services

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Rainbow ColorHow many colors can you see in rainbow?

Page 6: Bioinformatics Ontology for Automatic Workflow Generation  on Web/Grid Services

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Ontology for Rainbow Colors#800080 RGB Value

#000080

#0000FF

#008000

#FFFF00

#FF8000

#FF0000 Red

Yellow

Green

Blue

Indigo

Purple

Orange

From 360 nm~ 400 nm

to 760 nm~ 830 nm

All the colors you can see with your own eyes!

Page 7: Bioinformatics Ontology for Automatic Workflow Generation  on Web/Grid Services

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Which are Purple?

#800070

#800060

#500080

#800050 #700080

#600080#800080

Page 8: Bioinformatics Ontology for Automatic Workflow Generation  on Web/Grid Services

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Representation by Elements and Constructor

#800070

#500080

#800050

#700080

#600080

#800080

#800060

Purple

BlueElement

RedElement

Purple

Page 9: Bioinformatics Ontology for Automatic Workflow Generation  on Web/Grid Services

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Web Services for Bioinformatics

Page 10: Bioinformatics Ontology for Automatic Workflow Generation  on Web/Grid Services

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Advantages of Web Services

TaskC

Input

OutputTask

ATask

BTask

D

   Web Services

computing computing computing computingWeb Services

DB X DB Y DB Z

•Liberating from the maintenance of biological databases and tools•Scalability of computational resources•High-level application programming interface

Page 11: Bioinformatics Ontology for Automatic Workflow Generation  on Web/Grid Services

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Very Simple Work Flow

BLAST SearchSequenceUniProt

GetEntry

CLUSTAL W

UniProt

Hittable

Sequences

Tree View

Multiple Alignment

Phylogenetic tree

Page 12: Bioinformatics Ontology for Automatic Workflow Generation  on Web/Grid Services

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Manual Workflow on Web Apps

Page 13: Bioinformatics Ontology for Automatic Workflow Generation  on Web/Grid Services

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Web Service Programming

#!/usr/bin/perl

use SOAP::Lite;

# SOAP API # specify WSDL my $service = SOAP::Lite-> service('http://xml.nig.ac.jp/wsdl/GetEntry.wsdl');

# call web service $result = $service->getXML_DDBJEntry("AB000003");

# print result print $result;

http://www.xml.nig.ac.jp/perl.txt

Page 14: Bioinformatics Ontology for Automatic Workflow Generation  on Web/Grid Services

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Why don’t we use workflow tools?

http://www.cyclonic.org/Taverna_and_myGrid.ppt

Page 15: Bioinformatics Ontology for Automatic Workflow Generation  on Web/Grid Services

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Needs Automatic Workflow Generate Tool from

Very High Level Specification

apply Blastp to UniProt

GetEntry from UniProt

apply CLUSTALW

apply TreeView

Workflow for

Bioinformatics Web Services

AutomaticsGeneration

?

Page 16: Bioinformatics Ontology for Automatic Workflow Generation  on Web/Grid Services

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Automatic Generation of Bioinformatics Workflow

Page 17: Bioinformatics Ontology for Automatic Workflow Generation  on Web/Grid Services

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Task as Atomic Componentof Workflow

Output DataSpecification

sample{ddbjentry,flatfile}

Input DataSpecification

sample{aa_sequence,fasta}

Application

sample{blastp DAD}

Page 18: Bioinformatics Ontology for Automatic Workflow Generation  on Web/Grid Services

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Workflow as a Sequence of Tasks

Input

Output

Page 19: Bioinformatics Ontology for Automatic Workflow Generation  on Web/Grid Services

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Automatic Generation of Workflowfrom Given Input and Output Data Specification and Tasks

• Path Finding using Meta Information

Input

Output

TaskA

TaskB

TaskC

TaskD

Page 20: Bioinformatics Ontology for Automatic Workflow Generation  on Web/Grid Services

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Meta Data for Output

sample{ddbjentry,flatfile}

{aablastentry,hittable}

Meta Information to Specifythe Functionality of Task

Meta Data for Database

samples {uniprot}

{nt}

Meta Datafor Input

samples{na_sequence,fasta}{aa_sequence,fast}

Meta Information for Command and

Options

{blastn}{getentry}

TASK

Page 21: Bioinformatics Ontology for Automatic Workflow Generation  on Web/Grid Services

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Task Hierarchy (is_a)Abstract Task

Concrete Task

I

I

O

OI : Input TypeO: Output Type

S : Sequence or Sequence Name

V : Various Type

N : Nucleoside Sequence

A : Amino acid Sequence

id : Accession ID

E : Database Entry

Homology search

BLAST FASTA SSEARCH

rfastablastp

fasta

・・・

S V

S S S

N

V V V

A id

N

A

id

idN idblastx

idblastn

Page 22: Bioinformatics Ontology for Automatic Workflow Generation  on Web/Grid Services

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Task Hierarchy (has_a)

Genome Annotation[glimmer2,blastn,getEntry]

Well Known/user defined Task

glimmer2 blastn getEntryN id idS S E

ES

Page 23: Bioinformatics Ontology for Automatic Workflow Generation  on Web/Grid Services

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Prototype for ‘Proof of Concept’

• Language   tuProlog– Java to Prolog– Prolog to Java

• Web Service Interface through JAVA API

• Task Database– Prolog Clause Database

• Optimal Path Finding– Bidirectional Breadth First Search Algorithm

Page 24: Bioinformatics Ontology for Automatic Workflow Generation  on Web/Grid Services

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

System Overview

User

UserSpecification

Server

SPBIO

DDBJ

Workflow System

Knowledge Base

TaskDatabase

(1697)

Web ServiceInformation

(1596)

UIPrologEngine

tuProlog

Web ServiceLibrary(Java)

WorkflowLibrary(prolog)

Workflow

WorkflowExecution

UserData

Result

Page 25: Bioinformatics Ontology for Automatic Workflow Generation  on Web/Grid Services

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Screen Snapshot(Workflow Generation Phase)

Page 26: Bioinformatics Ontology for Automatic Workflow Generation  on Web/Grid Services

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Screen Snapshot (Workflow Execution Phase)

Page 27: Bioinformatics Ontology for Automatic Workflow Generation  on Web/Grid Services

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Obtained Phylogenetic Treeby a generated workflow

when applying to a Human Insulin Sequence P61982[Mus musculus] P61981[Homo sapiens] Q5RC20[Pongo pygmaeus] P61983[Rattus norvegicus] Q5F3W6[Gallus gallus] P68252[Bos taurus] Q6PCG0[Xenopus laevis] Q6NRY9[Xenopus laevis] Q04917[Homo sapiens] P68509[Bos taurus] P68511[Rattus norvegicus] P68510[Mus musculus] Q6UFZ2[Oncorhynchus mykiss] Q6PC29[Brachydanio rerio] Q6UFZ3[Oncorhynchus mykiss]

A.Konagaya ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Page 28: Bioinformatics Ontology for Automatic Workflow Generation  on Web/Grid Services

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Lessons from our First Experience

Page 29: Bioinformatics Ontology for Automatic Workflow Generation  on Web/Grid Services

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Task Database (prototype)

Web Service CallDDBJ BlastDDBJ SRSDDBJ GetEntryDDBJ ClustalWSPBIO Blast

453638

3862

405

Format TransformationData Selection

5645

In Total 1697

Page 30: Bioinformatics Ontology for Automatic Workflow Generation  on Web/Grid Services

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Test Set of Specification

Format Type Format Typeblastp uniprotfilter num25getfasta_swissentrymultiplealignmentblastp uniprotfilter num25getfasta_swissentrymultiplealignmentalignmentsearchfiltergetentrymultiplealignmentfiltergetentrymultiplealignment

aamultiplealignment

4 fasta aasequence

3 fasta aasequence gde

aamultiplealignment

2

1 fasta aasequence gde

NoWorkflow

Input Output Aplications

Page 31: Bioinformatics Ontology for Automatic Workflow Generation  on Web/Grid Services

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Differences of Generated WorkflowID Description

1052 alignment by blastp from UNIPROT102 identify data format and content.104 extract SeqIdentifier from ddbj BLAST result record109 extract Swiss-plot ACNumber from SequenceIdentifier

10005 idlist[25] from idlist[??]3107 Get SWISSPROT entry of FASTA Format by Accession Number.129 multi fasta format from fasta list4018 multiplealignment by clustalw with blosum1052 alignment by blastp from UNIPROT102 identify data format and content.104 extract SeqIdentifier from ddbj BLAST result record109 extract Swiss-plot ACNumber from SequenceIdentifier1005 idlist[25] from idlist[??]3107 Get SWISSPROT entry of FASTA Format by Accession Number.129 multi fasta format from fasta list4001 multiplealignment by clustalw with blosum1043 alignment by blastp from DAD102 identify data format and content.104 extract SeqIdentifier from ddbj BLAST result record109 extract Swiss-plot ACNumber from SequenceIdentifier

10001 idlist[25] from idlist[??]3107 Get SWISSPROT entry of FASTA Format by Accession Number.129 multi fasta format from fasta list4018 multiplealignment by clustalw with blosum1043 alignment by blastp from DAD102 identify data format and content.104 extract SeqIdentifier from ddbj BLAST result record109 extract Swiss-plot ACNumber from SequenceIdentifier

10001 idlist[5] from idlist[??]3107 Get SWISSPROT entry of FASTA Format by Accession Number.129 multi fasta format from fasta list4001 multiplealignment by clustalw with blosum

No. SolutionNum time(ms) First Match WebServiceCall

1 8 11266

2 41 46704

3 100 over 249906

4 100 over 25297 X?

InputDatabaseOutputFull Cmds

No inputDatabaseNo outputFull Cmds

inputNo DBNo outputPartial Cmds

InputNo DBOutputPartial Cmds

Meta Data

X?

Page 32: Bioinformatics Ontology for Automatic Workflow Generation  on Web/Grid Services

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Why Failed?

Output

Output

HitTable

HitTable

Lack ofInteroperabilityBetween theWeb Services

UNIPROT

InputAmino AcidSequence

blastp

DAD

InputAmino AcidSequence

blastp

Page 33: Bioinformatics Ontology for Automatic Workflow Generation  on Web/Grid Services

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Very Similar but not the Same FormatBlastp for UniProt

Blastp for DAD

sp|Q8HXV2|INS_PONPY Insulin precursor [Contains: Insulin B chain... 171 4e-43

L15440-1|AAA59179.1| 107|Homo sapiens insulin protein. 177 1e-43

Page 34: Bioinformatics Ontology for Automatic Workflow Generation  on Web/Grid Services

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Conclusion

• Web Services have great potential to share Bioinformatics Data ant Tools in all over the world

• Needs Automatic Workflow Generation Tools to make full use of Web Services

• Bioinformatics Ontology is a key to establish Interoperability among Bioinformatics Web Services

Page 35: Bioinformatics Ontology for Automatic Workflow Generation  on Web/Grid Services

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Acknowledgement

• Daisuke Shinbara Tokyo Institute of Technology (Hitachi, ltd.)

• Sumi Yoshikawa RIKEN GSC, TITECH

ReferencesAkihiko Konagaya: “Bioinformatics Ontology: Towards the Automatics Generation of Bioinformatics Workflow for Web Services,” in Proc. of Distributed Applications, Web Services, Tools and GRID Infrastructures for Bioinformatics (NETTAB2006),S. Margherita di Pula, Italy (http://www.nettab.org/2006/), pp.75-82 (2006)

Akihiko Konagaya: “OBIGrid: Towards the 'Ba' for Sharing Resources, Services and Knowledge for Bioinformatics”, in Proc. of Fourth International Workshop on Biomedical Computations on the Grid (BioGrid), Singapore (CCGRID 2006), 37 (2006)

Page 36: Bioinformatics Ontology for Automatic Workflow Generation  on Web/Grid Services

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006ご静聴ありがとうございました。

Thank You for Listening