2016 ACS Semantic Approaches for Biochemical Knowledge Discovery

  • View
    494

  • Download
    1

  • Category

    Science

Preview:

Citation preview

Semantic Approachesfor Biochemical Knowledge Discovery

1

Michel Dumontier, Ph.D.

Associate Professor of Medicine (Biomedical Informatics)Stanford University

@micheldumontier::ACS:15-03-2016

Science!

@micheldumontier::ACS:15-03-20162

3 @micheldumontier::ACS:15-03-2016

Most published research findings are false.- John Ioannidis, Stanford University

4 @micheldumontier::ACS:15-03-2016

Science is hard.

Scientific knowledge is growing at an unprecedented rate

5 @micheldumontier::ACS:15-03-2016

Reusing raw and curated data in thousands of databases is challenging: identifiers, formats, access methods, links

6 @micheldumontier::ACS:15-03-2016

Various software are needed to analyze data(problems: OS, versioning, input/output formats)

7 @micheldumontier::ACS:15-03-2016

Ultimately, scientists develop fairly sophisticated programs/workflows to test hypotheses

8 @micheldumontier::ACS:15-03-2016

The absence of intelligent systems

requires vast amounts of experience and technical expertise

@micheldumontier::ACS:15-03-20169

How can we automatically find the evidence that support or dispute a scientific hypothesis using the latest data, tools and scientific knowledge?

@micheldumontier::ACS:15-03-201610

So what do we need to achieve this?

1. Data Science Tools and Methods– To identify, represent, interlink, integrate, and query

data and services– To identify and uncover support for known or novel

associations

2. Community Standards to share and interrogate a massive, decentralized network of interconnected data and software

@micheldumontier::ACS:15-03-201611

First, we need FAIR data

Findable– Globally unique identifiers for datasets and the data they contain– Rich set of descriptors to search and filter with– Indexed and searchable

Accessible– Metadata is eternally available.– Identifiers are used to retrieve representations using standard protocols (e.g.

HTTP)

Interoperable– Data represented with formal knowledge representations– Include links to other datasets/vocabularies

Reusable– Licensing, Provenance, Community standards

@micheldumontier::ACS:15-03-201612

“Numbers have no way of speaking for themselves. We need to imbue them with meaning.” - Nate Silver, The signal and the noise

@micheldumontier::ACS:15-03-201613

FAIR: Findable, Accessible, Interoperable, Re-usable

See paper for motivation and examples

We are now starting to think about quality measures.

The Semantic Webis the new global web of knowledge

14 @micheldumontier::ACS:15-03-2016

standards for publishing, sharing and querying facts, expert knowledge and services

scalable approach for the discoveryof independently formulated

and distributed knowledge

Linked Data is FAIR data

15 @micheldumontier::ACS:15-03-2016Linking Open Data cloud diagram 2014, by Max Schmachtenberg, Christian Bizer, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/"

@micheldumontier::ACS:15-03-2016

Linked Data for the Life Sciences

16

Bio2RDF is an open source project to unify the representation and interlinking of biological data using RDF.

chemicals/drugs/formulations, genomes/genes/proteins, domainsInteractions, complexes & pathwaysanimal models and phenotypesDisease, genetic markers, treatmentsTerminologies & publications

• 11B+ interlinked statements from 35 biomedical datasets

• dataset description, provenance & statistics• A growing interoperable ecosystem with the EBI,

NCBI, DBCLS, NCBO, OpenPHACTS, and commercial tool providers

@micheldumontier::ACS:15-03-201617

Bio2RDF shows how datasets are connected together

@micheldumontier::ACS:15-03-201618

graph methods for data qualityto find mismatches and discover new links

@micheldumontier::ACS:15-03-201619

W Hu, H Qiu, M Dumontier. Link Analysis of Life Science Linked Data. International Semantic Web Conference (2) 2015: 446-462.

Federated Queriesover public SPARQL EndPoints

Get all protein catabolic processes (and more specific GO terms) in biomodels

SELECT ?go ?label count(distinct ?x) WHERE {service <http://bioportal.bio2rdf.org/sparql> {

?go rdfs:label ?label .?go rdfs:subClassOf+ ?tgo?tgo rdfs:label ?tlabel .FILTER regex(?tlabel, "^protein catabolic process")}service <http://biomodels.bio2rdf.org/sparql> {?x <http://bio2rdf.org/biopax_vocabulary:identical-to> ?go . ?x a <http://www.biopax.org/release/biopax-level3.owl#BiochemicalReaction> .

}}

@micheldumontier::ACS:15-03-201620

EbolaKBUsing Linked Data and Software

@micheldumontier::ACS:15-03-201621

Kamdar, Dumontier. An Ebola virus-centered knowledge base. Database. 2015 Jun 8;2015. doi: 10.1093/database/bav049.

@micheldumontier::ACS:15-03-201622

Network analysis and discovery

Jim McCusker & Deb McGuiness

David Wild, Ying Ding

@micheldumontier::ACS:15-03-201623

HyQue

tactical formalization

@micheldumontier::ACS:15-03-201624

Take what you needand represent it in a way that directly serves your objective

STANDARDSfor broader reuse

APPLICATIONSfor optimized experience

High Quality Metadata are Essential

for Large-Scale Reuse and Biomedical Discovery

25 @micheldumontier::ACS:15-03-2016

Making it Easier, Possibly Even Pleasant, to Author Interoperable Experimental Metadata

26 @micheldumontier::ACS:15-03-2016

27

metadatacenter.org

NIH COMMONS

@micheldumontier::ACS:15-03-2016

smartAPI

The goal is to reduce the barrier for the discovery andreuse of web APIs through richer semantic metadata.

i) a coordinated facility for the intelligent annotation ofsmart APIs

ii) a web application to discover smart APIs and howthey connect to each other.

iii) The augmentation of existing APIs to provide FAIRdata

28 @micheldumontier::ACS:15-03-2016

smartAPI

29

Gene

myGene.infomyVariant.info

Linking API Data

Web Services

Linked DataCloud

@micheldumontier::ACS:15-03-2016

Evan’s Questions

• What should we be doing now?– Encouraging researchers to publish FAIR data and

services• How should we be doing it?

– As Linked Data – Institutional repositories and available in wikidata and

other aggregators• Where are things going in the future?

– Reproducible analyses over indexed, archived, and massively connected knowledge graphs

@micheldumontier::ACS:15-03-201630

dumontierlab.commichel.dumontier@stanford.edu

Website: http://dumontierlab.comPresentations: http://slideshare.com/micheldumontier

31 @micheldumontier::ACS:15-03-2016

Recommended