Upload
gunnar
View
45
Download
0
Embed Size (px)
DESCRIPTION
Bioinformatics Ontology for Automatic Workflow Generation on Web/Grid Services . Konagaya Akihiko Project Director Advanced Genome Information Technology Research Group RIKEN Genomic Sciences Center. Contents. Role of Ontology Web Services for Bioinformatics Automatics Workflow Generation - PowerPoint PPT Presentation
Citation preview
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Bioinformatics Ontology for Automatic Workflow Generation
on Web/Grid Services Konagaya Akihiko
Project DirectorAdvanced Genome Information Technology
Research GroupRIKEN Genomic Sciences Center
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Contents
• Role of Ontology• Web Services for Bioinformatics• Automatics Workflow Generation• Lessons from our First Experience
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Role of Ontology
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Tacit and Explicit Knowledge
Michael Polanyi (1891-1976)
We should start from the fact that 'we can know more than we can tell'.
Michael Polanyi, “The Tacit Dimension” 1967
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Rainbow ColorHow many colors can you see in rainbow?
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Ontology for Rainbow Colors#800080 RGB Value
#000080
#0000FF
#008000
#FFFF00
#FF8000
#FF0000 Red
Yellow
Green
Blue
Indigo
Purple
Orange
From 360 nm~ 400 nm
to 760 nm~ 830 nm
All the colors you can see with your own eyes!
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Which are Purple?
#800070
#800060
#500080
#800050 #700080
#600080#800080
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Representation by Elements and Constructor
#800070
#500080
#800050
#700080
#600080
#800080
#800060
Purple
BlueElement
RedElement
Purple
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Web Services for Bioinformatics
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Advantages of Web Services
TaskC
Input
OutputTask
ATask
BTask
D
Web Services
computing computing computing computingWeb Services
DB X DB Y DB Z
•Liberating from the maintenance of biological databases and tools•Scalability of computational resources•High-level application programming interface
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Very Simple Work Flow
BLAST SearchSequenceUniProt
GetEntry
CLUSTAL W
UniProt
Hittable
Sequences
Tree View
Multiple Alignment
Phylogenetic tree
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Manual Workflow on Web Apps
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Web Service Programming
#!/usr/bin/perl
use SOAP::Lite;
# SOAP API # specify WSDL my $service = SOAP::Lite-> service('http://xml.nig.ac.jp/wsdl/GetEntry.wsdl');
# call web service $result = $service->getXML_DDBJEntry("AB000003");
# print result print $result;
http://www.xml.nig.ac.jp/perl.txt
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Why don’t we use workflow tools?
http://www.cyclonic.org/Taverna_and_myGrid.ppt
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Needs Automatic Workflow Generate Tool from
Very High Level Specification
apply Blastp to UniProt
GetEntry from UniProt
apply CLUSTALW
apply TreeView
Workflow for
Bioinformatics Web Services
AutomaticsGeneration
?
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Automatic Generation of Bioinformatics Workflow
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Task as Atomic Componentof Workflow
Output DataSpecification
sample{ddbjentry,flatfile}
Input DataSpecification
sample{aa_sequence,fasta}
Application
sample{blastp DAD}
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Workflow as a Sequence of Tasks
Input
Output
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Automatic Generation of Workflowfrom Given Input and Output Data Specification and Tasks
• Path Finding using Meta Information
Input
Output
TaskA
TaskB
TaskC
TaskD
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Meta Data for Output
sample{ddbjentry,flatfile}
{aablastentry,hittable}
Meta Information to Specifythe Functionality of Task
Meta Data for Database
samples {uniprot}
{nt}
Meta Datafor Input
samples{na_sequence,fasta}{aa_sequence,fast}
Meta Information for Command and
Options
{blastn}{getentry}
TASK
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Task Hierarchy (is_a)Abstract Task
Concrete Task
I
I
O
OI : Input TypeO: Output Type
S : Sequence or Sequence Name
V : Various Type
N : Nucleoside Sequence
A : Amino acid Sequence
id : Accession ID
E : Database Entry
Homology search
BLAST FASTA SSEARCH
rfastablastp
fasta
・・・
S V
S S S
N
V V V
A id
N
A
id
idN idblastx
idblastn
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Task Hierarchy (has_a)
Genome Annotation[glimmer2,blastn,getEntry]
Well Known/user defined Task
glimmer2 blastn getEntryN id idS S E
ES
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Prototype for ‘Proof of Concept’
• Language tuProlog– Java to Prolog– Prolog to Java
• Web Service Interface through JAVA API
• Task Database– Prolog Clause Database
• Optimal Path Finding– Bidirectional Breadth First Search Algorithm
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
System Overview
User
UserSpecification
Server
SPBIO
DDBJ
Workflow System
Knowledge Base
TaskDatabase
(1697)
Web ServiceInformation
(1596)
UIPrologEngine
tuProlog
Web ServiceLibrary(Java)
WorkflowLibrary(prolog)
Workflow
WorkflowExecution
UserData
Result
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Screen Snapshot(Workflow Generation Phase)
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Screen Snapshot (Workflow Execution Phase)
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Obtained Phylogenetic Treeby a generated workflow
when applying to a Human Insulin Sequence P61982[Mus musculus] P61981[Homo sapiens] Q5RC20[Pongo pygmaeus] P61983[Rattus norvegicus] Q5F3W6[Gallus gallus] P68252[Bos taurus] Q6PCG0[Xenopus laevis] Q6NRY9[Xenopus laevis] Q04917[Homo sapiens] P68509[Bos taurus] P68511[Rattus norvegicus] P68510[Mus musculus] Q6UFZ2[Oncorhynchus mykiss] Q6PC29[Brachydanio rerio] Q6UFZ3[Oncorhynchus mykiss]
A.Konagaya ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Lessons from our First Experience
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Task Database (prototype)
Web Service CallDDBJ BlastDDBJ SRSDDBJ GetEntryDDBJ ClustalWSPBIO Blast
453638
3862
405
Format TransformationData Selection
5645
In Total 1697
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Test Set of Specification
Format Type Format Typeblastp uniprotfilter num25getfasta_swissentrymultiplealignmentblastp uniprotfilter num25getfasta_swissentrymultiplealignmentalignmentsearchfiltergetentrymultiplealignmentfiltergetentrymultiplealignment
aamultiplealignment
4 fasta aasequence
3 fasta aasequence gde
aamultiplealignment
2
1 fasta aasequence gde
NoWorkflow
Input Output Aplications
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Differences of Generated WorkflowID Description
1052 alignment by blastp from UNIPROT102 identify data format and content.104 extract SeqIdentifier from ddbj BLAST result record109 extract Swiss-plot ACNumber from SequenceIdentifier
10005 idlist[25] from idlist[??]3107 Get SWISSPROT entry of FASTA Format by Accession Number.129 multi fasta format from fasta list4018 multiplealignment by clustalw with blosum1052 alignment by blastp from UNIPROT102 identify data format and content.104 extract SeqIdentifier from ddbj BLAST result record109 extract Swiss-plot ACNumber from SequenceIdentifier1005 idlist[25] from idlist[??]3107 Get SWISSPROT entry of FASTA Format by Accession Number.129 multi fasta format from fasta list4001 multiplealignment by clustalw with blosum1043 alignment by blastp from DAD102 identify data format and content.104 extract SeqIdentifier from ddbj BLAST result record109 extract Swiss-plot ACNumber from SequenceIdentifier
10001 idlist[25] from idlist[??]3107 Get SWISSPROT entry of FASTA Format by Accession Number.129 multi fasta format from fasta list4018 multiplealignment by clustalw with blosum1043 alignment by blastp from DAD102 identify data format and content.104 extract SeqIdentifier from ddbj BLAST result record109 extract Swiss-plot ACNumber from SequenceIdentifier
10001 idlist[5] from idlist[??]3107 Get SWISSPROT entry of FASTA Format by Accession Number.129 multi fasta format from fasta list4001 multiplealignment by clustalw with blosum
No. SolutionNum time(ms) First Match WebServiceCall
1 8 11266
2 41 46704
3 100 over 249906
4 100 over 25297 X?
InputDatabaseOutputFull Cmds
No inputDatabaseNo outputFull Cmds
inputNo DBNo outputPartial Cmds
InputNo DBOutputPartial Cmds
Meta Data
X?
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Why Failed?
Output
Output
HitTable
HitTable
Lack ofInteroperabilityBetween theWeb Services
UNIPROT
InputAmino AcidSequence
blastp
DAD
InputAmino AcidSequence
blastp
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Very Similar but not the Same FormatBlastp for UniProt
Blastp for DAD
sp|Q8HXV2|INS_PONPY Insulin precursor [Contains: Insulin B chain... 171 4e-43
L15440-1|AAA59179.1| 107|Homo sapiens insulin protein. 177 1e-43
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Conclusion
• Web Services have great potential to share Bioinformatics Data ant Tools in all over the world
• Needs Automatic Workflow Generation Tools to make full use of Web Services
• Bioinformatics Ontology is a key to establish Interoperability among Bioinformatics Web Services
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Acknowledgement
• Daisuke Shinbara Tokyo Institute of Technology (Hitachi, ltd.)
• Sumi Yoshikawa RIKEN GSC, TITECH
ReferencesAkihiko Konagaya: “Bioinformatics Ontology: Towards the Automatics Generation of Bioinformatics Workflow for Web Services,” in Proc. of Distributed Applications, Web Services, Tools and GRID Infrastructures for Bioinformatics (NETTAB2006),S. Margherita di Pula, Italy (http://www.nettab.org/2006/), pp.75-82 (2006)
Akihiko Konagaya: “OBIGrid: Towards the 'Ba' for Sharing Resources, Services and Knowledge for Bioinformatics”, in Proc. of Fourth International Workshop on Biomedical Computations on the Grid (BioGrid), Singapore (CCGRID 2006), 37 (2006)
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006ご静聴ありがとうございました。
Thank You for Listening