43
Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester

Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester

  • View
    224

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester

Migrating to the Semantic Web: Bioinformatics as a case

study.

Phillip Lord,

Dept of Computer Science,

University of Manchester

Page 2: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester

What is the Semantic Web

OWLRDFXML

We are here!

Page 3: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester

The talk

• Three (and a half) example case studies• Two different technologies. • Why we choose the different technologies.

Page 4: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester

RDF in a nutshell;Tim Berners-Lee’s original vision…

1989

Page 5: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester

OWL in a nutshell

Page 6: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester
Page 7: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester

The Motivation

“At the doctor’s office, Lucy instructed her semantic web agent. It promptly retrieved information about her Mom’s prescribed treatment, looked up a list of several providers within 20 miles of home, with a good trust rating.”

Page 8: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester

Scientific American, May 2001:

Beware of the

Hype!

Page 9: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester

The Motivating Example

LucyDoctor

Page 10: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester

myGrid

• UK e-Science Pilot Project.• Oct 2001 – April 2005.• £3.4 million.

• £0.4 million studentships. Newcastle

NottinghamManchester

Southampton

Hinxton

Sheffield

Page 11: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester

Data(type)-intensive bioinformatics

ID MURA_BACSU STANDARD; PRT; 429 AA.DE PROBABLE UDP-N-ACETYLGLUCOSAMINE 1-CARBOXYVINYLTRANSFERASEDE (EC 2.5.1.7) (ENOYLPYRUVATE TRANSFERASE) (UDP-N-ACETYLGLUCOSAMINEDE ENOLPYRUVYL TRANSFERASE) (EPT).GN MURA OR MURZ.OS BACILLUS SUBTILIS.OC BACTERIA; FIRMICUTES; BACILLUS/CLOSTRIDIUM GROUP; BACILLACEAE;OC BACILLUS.KW PEPTIDOGLYCAN SYNTHESIS; CELL WALL; TRANSFERASE.FT ACT_SITE 116 116 BINDS PEP (BY SIMILARITY).FT CONFLICT 374 374 S -> A (IN REF. 3).SQ SEQUENCE 429 AA; 46016 MW; 02018C5C CRC32; MEKLNIAGGD SLNGTVHISG AKNSAVALIP ATILANSEVT IEGLPEISDI ETLRDLLKEI GGNVHFENGE MVVDPTSMIS MPLPNGKVKK LRASYYLMGA MLGRFKQAVI GLPGGCHLGP RPIDQHIKGF EALGAEVTNE QGAIYLRAER LRGARIYLDV VSVGATINIM LAAVLAEGKT IIENAAKEPE IIDVATLLTS MGAKIKGAGT NVIRIDGVKE LHGCKHTIIP DRIEAGTFMI

Page 12: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester

Web Service (Grid Service) communication fabricWeb Service (Grid Service) communication fabric

AMBITText Extraction

Service

Provenance

Personalisation

Event Notification

Gateway

Service and WorkflowDiscovery

myGrid Information Repository

Ontology Mgt

Metadata Mgt

Work bench Taverna Talisman

Native Web Services

SoapLab

Web Portal

Legacy apps

Registries

Ontologies

FreeFluo Workflow Enactment Engine

OGSA-DQPDistributed Query Processor

Bio

info

rmat

icia

nsT

ool P

rovi

ders

Ser

vice

Pro

vide

rsA

pplicationsC

ore servicesE

xternal servicesService Stack

Views

Legacy apps

GowLab

Page 13: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester

WBS Workflows:

GenBank Accession No

GenBank Entry

Seqret

Nucleotide seq (Fasta)

GenScanCoding sequence

ORFs

prettyseq

restrict

cpgreport

RepeatMasker

ncbiBlastWrapper

sixpack

transeq

6 ORFs

Restriction enzyme map

CpG Island locations and %

Repetative elements

Translation/sequence file. Good for records and publications

Blastn Vs nr, est databases.

Amino Acid translation

epestfind

pepcoil

pepstats

pscan

Identifies PEST seq

Identifies FingerPRINTS

MW, length, charge, pI, etc

Predicts Coiled-coil regions

SignalPTargetPPSORTII

InterProPFAMPrositeSmart

Hydrophobic regions

Predicts cellular location

Identifies functional and structural domains/motifs

Pepwindow?Octanol?

ncbiBlastWrapper

URL inc GB identifier

tblastn Vs nr, est, est_mouse, est_human databases.Blastp Vs nr

RepeatMasker

Query nucleotide sequence ncbiBlastWrapper

Sort for appropriate Sequences only

Pink: Outputs/inputs of a servicePurple: Tailor-made servicesGreen: Emboss soaplab services Yellow: Manchester soaplab services Grey: Unknowns

RepeatMasker

Page 14: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester
Page 15: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester

Semantic discovery• Query-ontology – discovering

workflows and services described in the registry by building a query in Taverna.

• A common ontology is used to annotate and query.

• Look for all workflows that accept an input of semantic type nucleotide sequence.

• Aim to have semantic discovery over public view on the Web.

Page 16: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester

Service annotation

• Adding structured metadata to a workflow registration to enable others to discover and reuse it more effectively. E.g. what semantic type of input does it accept.

Page 17: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester

Semantic Discovery

View annotations on workflow

Pedro data capture tool

Drag a workflow entry into the explorer pane and the workflow loads.Drag a service/ workflow to the scavenger window for inclusion into the workflow

Page 18: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester

Biologist

Ontologist

Service Providers

Page 19: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester

Problems when doing In Silico ExperimentsExperiments being performed repeatedly, at different site, different time, by different users or groups;

Scientists

In silico experiments:

A large repository of records about experiments!!•verification of data;• “recipes” for experiment designs;• explanation for the impact of changes;• ownership;• performance of services;• data quality;

Page 20: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester

The Current State of the Art

Page 21: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester

Tim Berners-Lee’s original vision… 1989

Page 22: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester

A Semantic Web of Provenancewha

t

Literature relevant to

provenance study or data in this

workflow

Literature relevant to

provenance study or data in this

workflowDAML+OiL Ontologies linking provenance documents

ExperimentNotes

whyInterlinking graph of the workflow that generates the provenance logs

how

who

Web page of people who has related interests as the owner of the workflow

Provenance record of a workflow run

how/which/when/where

XML

HTML

XML

XML

PDF

Page 23: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester

Population Semantic Data

Web Services

Taverna

FreeFluo

MetadataRepository

Data Repository

LaunchPad Haystack

Page 24: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester

Haystack from IBM

Page 25: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester

BiologistBiologist

Database

Biologist

Page 26: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester

Gene Ontology Next Generation Project(GONG)

• Demonstrate the utility of finer grained concept descriptions in DAML+OIL (OWL-DL)

• Develop methodologies and tools to support the process

Page 27: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester

Translating theory into practice

• Gene Ontology provides a service to the model organism database community

• Description logic (DL) is a technology born out of computer science research

• OWL is a standard ontology interchange language underpinned by DL

Page 28: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester

GONG - proof of concept

• Maintaining an exhaustive is-a structure

GO conceptIs-a relationship

Parent

Page 29: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester

Axis 1:

Chemicals

[chemical] biosynthesis (GO:0009058)

[i] carbohydrate biosynthesis (GO:0016051)

[i] aminoglycan biosynthesis (GO:0006023)

[i] heparin biosynthesis (GO:0030210)

Example: heparin biosynthesis

Page 30: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester

Axis 1:

Chemicals

Axis 2:

Process

[chemical] biosynthesis (GO:0009058)

[i] carbohydrate biosynthesis (GO:0016051)

[i] aminoglycan biosynthesis (GO:0006023)

[i] heparin biosynthesis (GO:0030210)

[i] heparin metabolism (GO:0030202)

[i] heparin biosynthesis (GO:0030210)

Example: heparin biosynthesis

Page 31: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester

Axis 1:

Chemicals

Axis 2:

Process

[chemical] biosynthesis (GO:0009058)

[i] carbohydrate biosynthesis (GO:0016051)

[i] aminoglycan biosynthesis (GO:0006023)

[i] heparin biosynthesis (GO:0030210)

[i] glycosaminoglycan biosynthesis (GO:0006024)

[i] heparin metabolism (GO:0030202)

[i] heparin biosynthesis (GO:0030210)

Example: heparin biosynthesis

Page 32: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester

Is this important?

• Missing is-a not noticed by users

• BUT… improves fidelity of DB record retrieval.

– Asking for gene products involved in ‘glycosaminoglycan biosynthesis’ will lead to an additional result:

O94923 SPTr ISS - D-glucuronyl C5-epimerase (Fragment)

Page 33: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester

Paraphrased reasoning process

• heparin biosynthesis– class heparin biosynthesis defined

subClassOf biosynthesis restriction onProperty acts_on hasClass heparin

• glycosaminoglycan biosynthesis– class glycosaminoglycan biosynthesis defined

subClassOf biosynthesis restriction onProperty acts_on hasClass glycosaminoglycan

Is-a

Page 34: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester

Inferring a new is-a link

• heparin biosynthesis– class heparin biosynthesis defined

subClassOf biosynthesis restriction onProperty acts_on hasClass heparin

• glycosaminoglycan biosynthesis– class glycosaminoglycan biosynthesis defined

subClassOf biosynthesis restriction onProperty acts_on hasClass glycosaminoglycan

Is-a

Is-a

Page 35: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester

Results

• Carbohydrate metabolism ~250 concepts– 22 additional is-a links 17 of which now in GO

• Amino acid metabolism ~ 250 concepts– Further 17 additional is-a links now in GO

• GO team will be reviewing results for metabolism as a whole once we have the tools to support the process

• Useful results come from even a partial coverage

Page 36: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester

Build a practical environment

• Tools needed for:– Creating OWL definitions

– Tracking changes

– Reporting reasoning results

– Viewing definitions

Page 37: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester

Reporting tools

Page 38: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester

OWL for GONG

BiologistOntologist

Page 39: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester

Conclusions

• Three problems, three different solutions, all making use of semantic web technologies.

• A little semantics can go a long way. • The expressivity of the language has to be chosen at least

in part based on the tasks to be performed, and the user base.

• Tools, tools, tools.

Page 40: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester
Page 41: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester

Acknowledgments

• Jane Lomax and Midori Harris of the GO editorial team for help and advice and responding to the suggested changes

• UMLS and MeSH which provided valuable resources for chemical information• Sean Bechhofer for development on OilEd

• Project funded as a subcontract of the DARPA DAML programme

Chris Wroe, Robert Stevens, Carole GobleUniversity of Manchester, UKMichael AshburnerEBI, Hinxton, UK

Page 42: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester

Acknowledgements

myGrid is an EPSRC funded UK eScience Program Pilot Project

Particular thanks to the other members of the Taverna project, http://taverna.sf.net

Page 43: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester

myGrid People

Core• Matthew Addis, Nedim Alpdemir, Tim Carver, Rich Cawley, Neil Davis, Alvaro Fernandes, Justin Ferris,

Robert Gaizaukaus, Kevin Glover, Carole Goble, Chris Greenhalgh, Mark Greenwood, Yikun Guo, Ananth Krishna, Peter Li, Phillip Lord, Darren Marvin, Simon Miles, Luc Moreau, Arijit Mukherjee, Tom Oinn, Juri Papay, Savas Parastatidis, Norman Paton, Terry Payne, Matthew Pocock, Milena Radenkovic, Stefan Rennick-Egglestone, Peter Rice, Martin Senger, Nick Sharman, Robert Stevens, Victor Tan, Anil Wipat, Paul Watson and Chris Wroe.

Users• Simon Pearce and Claire Jennings, Institute of Human Genetics School of Clinical Medical Sciences,

University of Newcastle, UK• Hannah Tipney, May Tassabehji, Andy Brass, St Mary’s Hospital, Manchester, UKPostgraduates• Martin Szomszor, Duncan Hull, Jun Zhao, Pinar Alper, John Dickman, Keith Flanagan, Antoon Goderis,

Tracy Craddock, Alastair HampshireIndustrial • Dennis Quan, Sean Martin, Michael Niemi, Syd Chapman (IBM)• Robin McEntire (GSK)Collaborators• Keith Decker