38
The Semantic Web: New-style data-integration (and how it works for life- scientists too!) Frank van Harmelen AI Department Vrije Universiteit Amsterdam

The Semantic Web: New-style data-integration (and how it works for life-scientists too!)

  • Upload
    ash

  • View
    15

  • Download
    0

Embed Size (px)

DESCRIPTION

The Semantic Web: New-style data-integration (and how it works for life-scientists too!). Frank van Harmelen AI Department Vrije Universiteit Amsterdam. What’s the problem? (data-mess in bio-inf). Pharmaceutical Productivity. Source: PhRMA & FDA 2003. Kenneth Griffiths and Richard Resnick - PowerPoint PPT Presentation

Citation preview

Page 1: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)

The Semantic Web:New-style data-integration

(and how it works for life-scientists too!)

Frank van HarmelenAI Department

Vrije Universiteit Amsterdam

Page 2: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)

What’s the problem?

(data-mess in bio-inf)

Page 3: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)

Source: PhRMA & FDA 2003

Pharmaceutical Productivity

Page 4: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)

The Industry’s Problem

Too much unintegrated data:– from a variety of incompatible sources

– no standard naming convention

– each with a custom browsing and querying mechanism (no common interface)

– and poor interaction with other data sources

Kenneth Griffiths and Richard ResnickTut. At Intell. Systems for Molec. Biol., 2003

Page 5: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)

What are the Data Sources?

• Flat Files• URLs• Proprietary Databases• Public Databases• Data Marts• Spreadsheets• Emails• …

Page 6: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)

Sample Problem: Hyperprolactinemia

Over production of prolactin– prolactin stimulates mammary gland

development and milk production

Hyperprolactinemia is characterized by:– inappropriate milk production– disruption of menstrual cycle– can lead to conception difficulty

Page 7: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)

Understanding transcription factors for prolactin production

“Show me all genes in the public literature that are putatively related to hyperprolactinemia, have more than 3-fold expression differential between hyperprolactinemic and normal pituitary cells, and are homologous to known transcription factors.”

“Show me all genes that are homologous to known transcription factors”

SEQUENCE

1Q“Show me all genes that have more than 3-fold expression differential between hyperprolactinemic and normal pituitary cells”EXPRESSION

2Q

“Show me all genes in the public literature that are putatively related to hyperprolactinemia”

LITERATURE

3Q

(Q1Q2Q3)

Page 8: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)

The Medical tower of Babel Mesh

Medical Subject Headings, National Library of Medicine 22.000 descriptions

EMTREE Commercial Elsevier, Drugs and diseases 45.000 terms, 190.000 synonyms

UMLS Integrates 100 different vocabularies

SNOMED 200.000 concepts, College of American Pathologists

Gene Ontology 15.000 terms in molecular biology

NCI Cancer Ontology: 17,000 classes (about 1M definitions),

Page 9: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)

Stitching this all together by hand?

Source: Stephens et al. J Web Semantics 2006

Page 10: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)

Why would Semantic technology

help?

Page 12: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)

What is meta-data?

it's just datait's data describing other dataits' meant for machine consumption

disease

name

symptoms

drug

administration

Page 13: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)

Required are:1. one or more standard vocabularies

so search engines, producers and consumersall speak the same language

2. a standard syntax, so meta-data can be recognised as such

3. lots of resources with meta-data attached mechanisms for attribution and trust

is this page really about Pamela Anderson?

Page 14: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)

no shared understanding

Conceptual and terminological confusion

Actors: both humans and machines

Agree on a conceptualization

Make it explicit in some language.

world

concept

language

What are ontologies &what are they used for

Page 15: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)

standard vocabularies (“Ontologies”)Identify the key concepts in a domainIdentify a vocabulary for these

conceptsIdentify relations between these

conceptsMake these precise enough

so that they can be shared between humans and humans humans and machines machines and machines

Page 16: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)

Biomedical ontologies (a few..) Mesh

Medical Subject Headings, National Library of Medicine 22.000 descriptions

EMTREE Commercial Elsevier, Drugs and diseases 45.000 terms, 190.000 synonyms

UMLS Integrates 100 different vocabularies

SNOMED 200.000 concepts, College of American Pathologists

Gene Ontology 15.000 terms in molecular biology

NCBI Cancer Ontology: 17,000 classes (about 1M definitions),

Page 17: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)

Remember “required are”: one or more standard vocabularies

so search engines, producers and consumersall speak the same language

2. a standard syntax, so meta-data can be recognised as such

3. lots of resources with meta-data attached

Page 18: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)

Stack of languages

Page 19: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)

Stack of languagesXML:

Surface syntax, no semanticsXML Schema:

Describes structure of XML documentsRDF:

Datamodel for “relations” between “things”RDF Schema:

RDF Vocabular Definition LanguageOWL:

A more expressive Vocabular Definition Language

Page 20: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)

Remember “required are”: one or more standard vocabularies

so search engines, producers and consumersall speak the same language

a standard syntax, so meta-data can be recognised as such

3. lots of resources with meta-data attached

Page 21: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)

Question: who writes the ontologies?Professional bodies, scientific

communities, companies, publishers, ….

See previous slide on Biomedical ontologies Same developments in many other fields

Good old fashioned Knowledge Engineering

Convert from DB-schema, UML, etc.

Page 22: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)

Question:Who writes the meta-data ?

- Automated learning- shallow natural language analysis- Concept extraction

amsterdam

trade

antwerp europe

amsterdam

merchant

city town

center

netherlandsmerchant

city town

Example: Encyclopedia Britannica on “Amsterdam”

Page 23: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)

exploit existing legacy-data Databases Lab equipment (Amazon)

side-effect from user interaction email keyword extraction

NOT from manual effort

Question:Who writes the meta-data ?

Page 24: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)

Remember “required are” one or more standard vocabularies

so search engines, producers and consumersall speak the same language

a standard syntax, so meta-data can be recognised as such

lots of resources with meta-data attached

Page 25: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)

Some working examples?• DOPE

Page 26: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)

DOPE: BackgroundVertical Information Provision

Buy a topic instead of a Journal ! Web provides new opportunities

Business driver: drug development Rich, information-hungry market Good thesaurus (EMTREE)

Page 27: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)

The Data Document repositories:

ScienceDirect: approx. 500.000 fulltext articles

MEDLINE: approx. 10.000.000 abstracts

Extracted Metadata The Collexis Metadata Server: concept-

extraction ("semantic fingerprinting")

Thesauri and Ontologies EMTREE:

60.000 preferred terms 200.000 synonyms

Page 28: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)

RDF Schema

EMTREE

Queryinterface

RDF

Datasource 1

RDF

Datasource n….

Architecture:

Page 29: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)
Page 30: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)

Ontology disambiguates

query

Page 31: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)

Ontology groups results

Page 32: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)

Ontology clusters results

Page 33: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)

Ontology refinesquery

Page 34: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)

Some working examples?

• DOPE• HCLS (http://www.w3.org/2001/sw/hcls/)

Page 35: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)

RDF Schema

EMTREE

Queryinterface

RDF

Datasource 1

RDF

Datasource n….

Architecture:

RDF Schema

Gene Ontology ….

Page 36: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)
Page 37: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)

Summarising… Data integration on the Web:

machine processable data besides human processable data

Syntax for meta-data Representation Inference

Vocabularies for meta-data Lot’s of them in bio-inf.

Actual meta-data: Lot’s in bio-inf.

Will enable: Better search engines (recall, precision, concepts) Combining information across pages (inference) …

Page 38: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)

Things to do for you Practical:

Use existing software to construct new use-scenario’s

Conceptual:Create on ontology for some area of bio-medical expertise

from scratch as a refinement of an existing ontology

Technical:Transform an existing data-set in meta-data format, and provide a query interface (for humans and machines)