of 23/23
Triples & Access Jan Velterop

Triples And Access

  • View
    1.148

  • Download
    1

Embed Size (px)

DESCRIPTION

Presentation given at the inaugural meeting of the Concept Web Alliance, 8 May 2009

Text of Triples And Access

  • 1. Triples & Access Jan Velterop
  • 2. There is something fascinating about science. One gets such wholesale returns of conjecture out of such a trifling investment of fact. Mark Twain, Life on the Mississippi
  • 3. O yeah? We have far too few returns in terms of usable knowledge out of such overwhelming investment of fact! A lot of fact is deeply hidden!
  • 4. Current Knowledge Transfer A metaphor (is Greek for truck after all) Needle transport
  • 5. Information overload? Too much knowledge? Stop acquiring it? Just filtering it? Or organisation underload? Lack of conceptual structure? Unprecedented opportunity?
  • 6. Information overload? Too much knowledge? Stop acquiring it? Just filtering it? Or organisation underload? Lack of conceptual structure? Unprecedented opportunity!
  • 7. Another metaphor: What is the use of water?
  • 8. H2O Drink (take in)
  • 9. What is the use of information?
  • 10. Age to Know Read (take in)
  • 11. Publish articles
  • 12. Stretching the water metaphor: Its already raining we must build the ark
  • 13. The animals to come on board:
  • 14. Slide by Carl Lagoze (Cornell) from this presentation: http://journal.webscience.org/112/3/orechem.pdf
  • 15. Stretching the metaphor further: If you need water, rain is free
  • 16. But if you want quality control and convenience:
  • 17. (node 1, unique ID) (node 2, unique ID) < Source concept > < Relations (edge) > < Target Concept > class date value owner condi/on DOI. All Triples Smart Triples curated curated curated Curated Remove Co-occ Observational Ambiguity and Redundancy Inferred Knowledge Space
  • 18. (node 1, unique ID) (node 2, unique ID) < Source concept > < Relations (edge) > < Target Concept > class date value author condi/on DOI } Database facts (multiple attributes) Community Annotations F+C+A+ Co-occurrence sentence (abstracts e.g. PubMed) Co-occurrence Full Text (publisher e.g. Springer) C+A+ Concept Profile Match Co-expression (gene expression Databases) A+ Modelling hypothesis (e.g. Plectix, InWeb) Multiple Triples T-Cell Development Graph Building (e.g. WikiPathways) Unique to 101668678 Cancer Promoting Genes Interleukin-7 Unique to Springer Unique to Plectix
  • 19. Unique to 101668678
  • 20. (node 1, unique ID) (node 2, unique ID) < Source concept > < Relations (edge) > < Target Concept > class date value author condi/on DOI } Database facts (multiple attributes) Community Annotations F+C+A+ Co-occurrence sentence (abstracts e.g. PubMed) Co-occurrence Full Text (publisher e.g. Springer) C+A+ Concept Profile Match Co-expression (gene expression Databases) A+ Modelling hypothesis (e.g. Plectix, InWeb) Multiple Triples T-Cell Development Graph Building (e.g. WikiPathways) Unique to 101668678 Cancer Promoting Genes Interleukin-7 Unique to Springer Unique to Plectix
  • 21. (node 1, unique ID) (node 2, unique ID) < Source concept > < Relations (edge) > < Target Concept > class date value owner condi/on Etc. Triples Smart Triples In these areas significant value Remove is added to the triples Curated Ambiguity and Redundancy Remove Observational Ambiguity and Redundancy Remove Inferred; Ambiguity and constructed Redundancy Knowledge Space
  • 22. The trustmark CWATM: Triple model Best practice Interoperability Et cetera
  • 23. DownloadConceptWebAlliancecer/edtriples Includes edges from: Pubmed (400,000,000 sentences, 5,000,000,000 concept co-occurrences) (from public data) Protein databases (UniProt, IntAct, PDB, HPRD 75,000 human curated PPIs) (from public data) Gene (co-expression databases (GEO, Express 25 square genes) (from public data) STRING edges (200,000 gene-gene edges) (from semi public data) InWeb edges (240,000 unique edges from 17 species) (from proprietary data) Reactome edges (240,000 unique edges from 17 species) (from proprietary data) Chemspider edges (25,000,000 chemicals) (from semi public data) Wiki edges (WikEdge = WikiPathways, WikiProfessionals, Omegawiki, Wikigene) Plectix edges (5,000 extra edges (PPI modeling) (from proprietary data) Private expression data (3000 extra edges, by Merck) (from proprietary data) Et Cetera