Upload
velterop
View
1.157
Download
1
Embed Size (px)
DESCRIPTION
Presentation given at the inaugural meeting of the Concept Web Alliance, 8 May 2009
Citation preview
Triples & Access
Jan Velterop
There is something fascinating about science.There is something fascinating about science.One gets such wholesale returns of conjecture out ofOne gets such wholesale returns of conjecture out ofsuch a trifling investment of fact.such a trifling investment of fact.
Mark Twain, Mark Twain, Life on the MississippiLife on the Mississippi
“”
O yeah?O yeah?
We have far too We have far too fewfew returns in terms of usable returns in terms of usableknowledge out of such overwhelming knowledge out of such overwhelming investment ofinvestment offactfact!!A lot of fact is deeply hidden!A lot of fact is deeply hidden!
Current Knowledge TransferCurrent Knowledge Transfer
Needle transportNeedle transport
A A metaphormetaphor (is Greek for (is Greek for ‘‘trucktruck’’ after all) after all)
Information overload?Information overload?
Too much knowledge?Too much knowledge?
Stop acquiring it?Stop acquiring it?
Or Or organisation underloadorganisation underload??
Unprecedented opportunity?Unprecedented opportunity?
Just filtering it?Just filtering it?
Lack of conceptual structure?Lack of conceptual structure?
Information overload?Information overload?
Too much knowledge?Too much knowledge?
Stop acquiring it?Stop acquiring it?
Or Or organisation underloadorganisation underload??
Unprecedented opportunity!Unprecedented opportunity!
Just filtering it?Just filtering it?
Lack of conceptual structure?Lack of conceptual structure?
Anothermetaphor:
What is the useof water?
H2O
Drink(take in)
What is the useof information?
Read(take in)
Age to Know
Publish articles
Stretching thewater metaphor:
It’s alreadyraining – wemust build theark
The ‘animals’ to come on board:
Slide by Carl Lagoze (Cornell) – from this presentation:http://journal.webscience.org/112/3/orechem.pdf
Stretching themetaphorfurther:
If you needwater, rain isfree
But if you wantquality controlandconvenience:
curatedcuratedcurated
Co-occ
All Triples
Remove
Ambiguity
and
Redundancy
Curated
Observational
Smart Triples
Inferred
Knowledge Space
(node 1, unique ID) (node 2, unique ID)
< Source concept > < Target Concept >< Relations (edge) >
class date value owner condi/on DOI.
<Type F1> Database facts (multiple attributes) <Type F2> Community Annotations
<Type C1> Co-occurrence sentence (abstracts e.g. PubMed) <Type C2> Co-occurrence Full Text (publisher e.g. Springer)
<Type A1> Concept Profile Match <Type A3> Co-expression (gene expression Databases) <Type A4> Modelling hypothesis (e.g. Plectix, InWeb)
Graph Building (e.g. WikiPathways)
Multiple Triples
F+C+A+
C+A+
A+
Unique to Springer
Unique to Plectix
Unique to 101668678
T-Cell Development
Cancer Promoting GenesInterleukin-7
(node 1, unique ID) (node 2, unique ID)
< Source concept > < Target Concept >< Relations (edge) >
class date value author condi/on DOI
}
Unique to 101668678
<Type F1> Database facts (multiple attributes) <Type F2> Community Annotations
<Type C1> Co-occurrence sentence (abstracts e.g. PubMed) <Type C2> Co-occurrence Full Text (publisher e.g. Springer)
<Type A1> Concept Profile Match <Type A3> Co-expression (gene expression Databases) <Type A4> Modelling hypothesis (e.g. Plectix, InWeb)
Graph Building (e.g. WikiPathways)
Multiple Triples
F+C+A+
C+A+
A+
Unique to Springer
Unique to Plectix
Unique to 101668678
T-Cell Development
Cancer Promoting GenesInterleukin-7
(node 1, unique ID) (node 2, unique ID)
< Source concept > < Target Concept >< Relations (edge) >
class date value author condi/on DOI
}
Triples
Remove Ambiguity and
Redundancy
Curated
Observational
Smart Triples
Inferred;constructed
Knowledge Space
(node 1, unique ID) (node 2, unique ID)
< Source concept > < Target Concept >< Relations (edge) >
class date value owner condi/on Etc.
Remove Ambiguity and
Redundancy
Remove Ambiguity and
Redundancy
In these areas significant valueis added to the triples
The ‘trustmark’CWATM:
Triple ‘model’Best practiceInteroperabilityEt cetera
DownloadConceptWebAlliancecer/fiedtriples
Includes edges from:
Pubmed (400,000,000 sentences, 5,000,000,000 concept co-occurrences) (from public data)
Protein databases (UniProt, IntAct, PDB, HPRD – 75,000 human curated PPIs) (from public data)
Private expression data (3000 extra edges, by Merck) (from proprietary data)
InWeb edges (240,000 unique edges from 17 species) (from proprietary data)
Plectix edges (5,000 extra edges (PPI modeling) (from proprietary data)
Gene (co-expression databases (GEO, Express… – 25 square genes) (from public data)
STRING edges (200,000 gene-gene edges) (from semi public data)
Reactome edges (240,000 unique edges from 17 species) (from proprietary data)
Chemspider edges (25,000,000 chemicals) (from semi public data)
Wiki edges (WikEdge = WikiPathways, WikiProfessionals, Omegawiki, Wikigene)
Et Cetera