23
Triples & Access Jan Velterop

Triples And Access

Embed Size (px)

DESCRIPTION

Presentation given at the inaugural meeting of the Concept Web Alliance, 8 May 2009

Citation preview

Page 1: Triples And Access

Triples & Access

Jan Velterop

Page 2: Triples And Access

There is something fascinating about science.There is something fascinating about science.One gets such wholesale returns of conjecture out ofOne gets such wholesale returns of conjecture out ofsuch a trifling investment of fact.such a trifling investment of fact.

Mark Twain, Mark Twain, Life on the MississippiLife on the Mississippi

“”

Page 3: Triples And Access

O yeah?O yeah?

We have far too We have far too fewfew returns in terms of usable returns in terms of usableknowledge out of such overwhelming knowledge out of such overwhelming investment ofinvestment offactfact!!A lot of fact is deeply hidden!A lot of fact is deeply hidden!

Page 4: Triples And Access

Current Knowledge TransferCurrent Knowledge Transfer

Needle transportNeedle transport

A A metaphormetaphor (is Greek for (is Greek for ‘‘trucktruck’’ after all) after all)

Page 5: Triples And Access

Information overload?Information overload?

Too much knowledge?Too much knowledge?

Stop acquiring it?Stop acquiring it?

Or Or organisation underloadorganisation underload??

Unprecedented opportunity?Unprecedented opportunity?

Just filtering it?Just filtering it?

Lack of conceptual structure?Lack of conceptual structure?

Page 6: Triples And Access

Information overload?Information overload?

Too much knowledge?Too much knowledge?

Stop acquiring it?Stop acquiring it?

Or Or organisation underloadorganisation underload??

Unprecedented opportunity!Unprecedented opportunity!

Just filtering it?Just filtering it?

Lack of conceptual structure?Lack of conceptual structure?

Page 7: Triples And Access

Anothermetaphor:

What is the useof water?

Page 8: Triples And Access

H2O

Drink(take in)

Page 9: Triples And Access

What is the useof information?

Page 10: Triples And Access

Read(take in)

Age to Know

Page 11: Triples And Access

Publish articles

Page 12: Triples And Access

Stretching thewater metaphor:

It’s alreadyraining – wemust build theark

Page 13: Triples And Access

The ‘animals’ to come on board:

Page 14: Triples And Access

Slide by Carl Lagoze (Cornell) – from this presentation:http://journal.webscience.org/112/3/orechem.pdf

Page 15: Triples And Access

Stretching themetaphorfurther:

If you needwater, rain isfree

Page 16: Triples And Access

But if you wantquality controlandconvenience:

Page 17: Triples And Access

curatedcuratedcurated

Co-occ

All Triples

Remove

Ambiguity

and

Redundancy

Curated

Observational

Smart Triples

Inferred

Knowledge Space

(node 1, unique ID) (node 2, unique ID)

< Source concept > < Target Concept >< Relations (edge) >

class date value owner condi/on DOI.

Page 18: Triples And Access

<Type F1> Database facts (multiple attributes) <Type F2> Community Annotations

<Type C1> Co-occurrence sentence (abstracts e.g. PubMed) <Type C2> Co-occurrence Full Text (publisher e.g. Springer)

<Type A1> Concept Profile Match <Type A3> Co-expression (gene expression Databases) <Type A4> Modelling hypothesis (e.g. Plectix, InWeb)

Graph Building (e.g. WikiPathways)

Multiple Triples

F+C+A+

C+A+

A+

Unique to Springer

Unique to Plectix

Unique to 101668678

T-Cell Development

Cancer Promoting GenesInterleukin-7

(node 1, unique ID) (node 2, unique ID)

< Source concept > < Target Concept >< Relations (edge) >

class date value author condi/on DOI

}

Page 19: Triples And Access

Unique to 101668678

Page 20: Triples And Access

<Type F1> Database facts (multiple attributes) <Type F2> Community Annotations

<Type C1> Co-occurrence sentence (abstracts e.g. PubMed) <Type C2> Co-occurrence Full Text (publisher e.g. Springer)

<Type A1> Concept Profile Match <Type A3> Co-expression (gene expression Databases) <Type A4> Modelling hypothesis (e.g. Plectix, InWeb)

Graph Building (e.g. WikiPathways)

Multiple Triples

F+C+A+

C+A+

A+

Unique to Springer

Unique to Plectix

Unique to 101668678

T-Cell Development

Cancer Promoting GenesInterleukin-7

(node 1, unique ID) (node 2, unique ID)

< Source concept > < Target Concept >< Relations (edge) >

class date value author condi/on DOI

}

Page 21: Triples And Access

Triples

Remove Ambiguity and

Redundancy

Curated

Observational

Smart Triples

Inferred;constructed

Knowledge Space

(node 1, unique ID) (node 2, unique ID)

< Source concept > < Target Concept >< Relations (edge) >

class date value owner condi/on Etc.

Remove Ambiguity and

Redundancy

Remove Ambiguity and

Redundancy

In these areas significant valueis added to the triples

Page 22: Triples And Access

The ‘trustmark’CWATM:

Triple ‘model’Best practiceInteroperabilityEt cetera

Page 23: Triples And Access

DownloadConceptWebAlliancecer/fiedtriples

Includes edges from:

Pubmed (400,000,000 sentences, 5,000,000,000 concept co-occurrences) (from public data)

Protein databases (UniProt, IntAct, PDB, HPRD – 75,000 human curated PPIs) (from public data)

Private expression data (3000 extra edges, by Merck) (from proprietary data)

InWeb edges (240,000 unique edges from 17 species) (from proprietary data)

Plectix edges (5,000 extra edges (PPI modeling) (from proprietary data)

Gene (co-expression databases (GEO, Express… – 25 square genes) (from public data)

STRING edges (200,000 gene-gene edges) (from semi public data)

Reactome edges (240,000 unique edges from 17 species) (from proprietary data)

Chemspider edges (25,000,000 chemicals) (from semi public data)

Wiki edges (WikEdge = WikiPathways, WikiProfessionals, Omegawiki, Wikigene)

Et Cetera