2016 ACS Semantic Approaches for Biochemical Knowledge Discovery

Semantic Approachesfor Biochemical Knowledge Discovery

Michel Dumontier, Ph.D.

Associate Professor of Medicine (Biomedical Informatics)Stanford University

@micheldumontier::ACS:15-03-2016

Science!

3 @micheldumontier::ACS:15-03-2016

Most published research findings are false.- John Ioannidis, Stanford University

Science is hard.

Scientific knowledge is growing at an unprecedented rate

Reusing raw and curated data in thousands of databases is challenging: identifiers, formats, access methods, links

Various software are needed to analyze data(problems: OS, versioning, input/output formats)

Ultimately, scientists develop fairly sophisticated programs/workflows to test hypotheses

The absence of intelligent systems

requires vast amounts of experience and technical expertise

How can we automatically find the evidence that support or dispute a scientific hypothesis using the latest data, tools and scientific knowledge?

So what do we need to achieve this?

1. Data Science Tools and Methods– To identify, represent, interlink, integrate, and query

data and services– To identify and uncover support for known or novel

associations

2. Community Standards to share and interrogate a massive, decentralized network of interconnected data and software

First, we need FAIR data

Findable– Globally unique identifiers for datasets and the data they contain– Rich set of descriptors to search and filter with– Indexed and searchable

Accessible– Metadata is eternally available.– Identifiers are used to retrieve representations using standard protocols (e.g.

Interoperable– Data represented with formal knowledge representations– Include links to other datasets/vocabularies

Reusable– Licensing, Provenance, Community standards

“Numbers have no way of speaking for themselves. We need to imbue them with meaning.” - Nate Silver, The signal and the noise

FAIR: Findable, Accessible, Interoperable, Re-usable

See paper for motivation and examples

We are now starting to think about quality measures.

The Semantic Webis the new global web of knowledge

standards for publishing, sharing and querying facts, expert knowledge and services

scalable approach for the discoveryof independently formulated

and distributed knowledge

Linked Data is FAIR data

15 @micheldumontier::ACS:15-03-2016Linking Open Data cloud diagram 2014, by Max Schmachtenberg, Christian Bizer, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/"

Linked Data for the Life Sciences

Bio2RDF is an open source project to unify the representation and interlinking of biological data using RDF.

chemicals/drugs/formulations, genomes/genes/proteins, domainsInteractions, complexes & pathwaysanimal models and phenotypesDisease, genetic markers, treatmentsTerminologies & publications

• 11B+ interlinked statements from 35 biomedical datasets

• dataset description, provenance & statistics• A growing interoperable ecosystem with the EBI,

NCBI, DBCLS, NCBO, OpenPHACTS, and commercial tool providers

Bio2RDF shows how datasets are connected together

graph methods for data qualityto find mismatches and discover new links

W Hu, H Qiu, M Dumontier. Link Analysis of Life Science Linked Data. International Semantic Web Conference (2) 2015: 446-462.

Federated Queriesover public SPARQL EndPoints

Get all protein catabolic processes (and more specific GO terms) in biomodels

SELECT ?go ?label count(distinct ?x) WHERE {service <http://bioportal.bio2rdf.org/sparql> {

?go rdfs:label ?label .?go rdfs:subClassOf+ ?tgo?tgo rdfs:label ?tlabel .FILTER regex(?tlabel, "^protein catabolic process")}service <http://biomodels.bio2rdf.org/sparql> {?x <http://bio2rdf.org/biopax_vocabulary:identical-to> ?go . ?x a <http://www.biopax.org/release/biopax-level3.owl#BiochemicalReaction> .

EbolaKBUsing Linked Data and Software

Kamdar, Dumontier. An Ebola virus-centered knowledge base. Database. 2015 Jun 8;2015. doi: 10.1093/database/bav049.

Network analysis and discovery

Jim McCusker & Deb McGuiness

David Wild, Ying Ding

tactical formalization

Take what you needand represent it in a way that directly serves your objective

STANDARDSfor broader reuse

APPLICATIONSfor optimized experience

High Quality Metadata are Essential

for Large-Scale Reuse and Biomedical Discovery

Making it Easier, Possibly Even Pleasant, to Author Interoperable Experimental Metadata

metadatacenter.org

NIH COMMONS

smartAPI

The goal is to reduce the barrier for the discovery andreuse of web APIs through richer semantic metadata.

i) a coordinated facility for the intelligent annotation ofsmart APIs

ii) a web application to discover smart APIs and howthey connect to each other.

iii) The augmentation of existing APIs to provide FAIRdata

smartAPI

myGene.infomyVariant.info

Linking API Data

Web Services

Linked DataCloud

Evan’s Questions

• What should we be doing now?– Encouraging researchers to publish FAIR data and

services• How should we be doing it?

– As Linked Data – Institutional repositories and available in wikidata and

other aggregators• Where are things going in the future?

– Reproducible analyses over indexed, archived, and massively connected knowledge graphs

dumontierlab.commichel.dumontier@stanford.edu

Website: http://dumontierlab.comPresentations: http://slideshare.com/micheldumontier

2016 ACS Semantic Approaches for Biochemical Knowledge Discovery

Science

Biochemical markers

ChemEd DL WikiHyperGlossary (WHG): A Social Semantic Information Literacy Service for Digital Documents 245 th ACS National Meeting CINF Oral Session Library

ENG ACS · ENG ACS

ACS 863 and ACS 763

Biochemical Kinds

Biochemical assessment

ACS-20B-MRTU ACS-20W-MRTU

Biochemical Engineering

Biochemical Genetics

BIOCHEMICAL TECHNOLOGY DIVISIONmfey/gl/BIOTPC 2008 NAT MEETING...the BIOT program when Ranjan Srivastava and Anurag Rathore organize the Fall 2008 ACS meeting in Philadelphia. Wilfred

Biochemical Reaction

Biochemical Reactions

User Manual ACS-1208A ACS-1216A - wecl.com.hk · ACS-1208A / ACS-1216A User Manual vii. Notes: 2003-06-20 ACS-1208A / ACS-1216A User Manual viii. Chapter 1. Introduction This chapter

Biochemical Characterization

Biochemical Calculations

GFVO: the Genomic Feature and Variation Ontology - Semantic … · 2017-06-18 · XML-based format for representing computational models and biochemical networks. A ... can be expressed

Biochemical anamoliy

ACS 600 NBRA-6XX - 5117market.5117.com/data/file/323.pdf · iv nbra-6xx ! acs 600 acs 600 acs 600 acs 600 acs 600 acs 600 (udc+, udc-) acs607 acs 600 udc+, udc-, r+, r- r+ r- acs

ACS GCI Chemical Manufacturers Roundtable · 2018. 6. 6. · ACS GCI Chemical Manufacturers Roundtable. ... petroleum, natural gas, or other sources via chemical or biochemical processes

Biochemical Engineer