41
The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San Diego

The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San

Embed Size (px)

Citation preview

Page 1: The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San

The Neuroscience Information Framework

Establishing a practical semantic framework for neuroscienceMaryann Martone, Ph. D.

University of California, San Diego

Page 2: The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San

NIF TeamAmarnath Gupta, UCSD, Co Investigator

Jeff Grethe, UCSD, Co Investigator

Gordon Shepherd, Yale University

Perry Miller

Luis Marenco

David Van Essen, Washington University

Erin Reid

Paul Sternberg, Cal Tech

Arun Rangarajan

Hans Michael Muller

Giorgio Ascoli, George Mason University

Sridevi Polavarum

Anita Bandrowski, NIF Curator

Fahim Imam, NIF Ontology Engineer

Karen Skinner, NIH, Program Officer

Lee Hornbrook

Kara Lu

Vadim Astakhov

Xufei Qian

Chris Condit

Stephen Larson

Sarah Maynard

Bill Bug

Karen Skinner, NIH

Page 3: The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San

What does this mean?

• 3D Volumes• 2D Images• Surface meshes• Tree structure• Ball and stick models• Little squiggly lines

Data People

Information systems

Page 4: The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San

The Neuroscience Information Framework: Discovery and utilization of web-based resources for neuroscience

http://neuinfo.org

UCSD, Yale, Cal Tech, George Mason, Washington Univ

Supported by NIH Blueprint

A portal for finding and using neuroscience resources

A consistent framework for describing resources

Provides simultaneous search of multiple types of information, organized by category

Supported by an expansive ontology for neuroscience

Utilizes advanced technologies to search the “hidden web”

Page 5: The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San

Where do I find…

• Data• Software tools• Materials• Services• Training• Jobs• Funding opportunities

• Websites• Databases• Catalogs• Literature• Supplementary

material• Information portals

...And how many are there?

Page 6: The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San

NIF in action

Page 7: The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San

Query expansion: Synonyms and related concepts

Boolean queriesData sources

categorized by “data type” and level of nervous

system

Simplified views of complex data

sources

Tutorials for using full resource when getting there from

NIF

Hippocampus OR “Cornu Ammonis” OR “Ammon’s horn”

Page 8: The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San

NIF searches across multiple sources of information

• NIF data federation• Independent

databases registered with NIF or through web services

• NIF registry: catalog• NIF Web: custom

web index (work in progress)

• NIF literature: neuroscience-centered literature corpus (Textpresso)

Page 9: The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San

Guiding principles of NIF• Builds heavily on existing technologies (open source tools)• Information resources come in many sizes and flavors• Framework has to work with resources as they are, not as we

wish them to be– Federated system; resources will be independently maintained– Very few use standard terminology or map to ontologies

• No single strategy will work for the current diversity of neuroscience resources

• Trying to design the framework so it will be as broadly applicable as possible to those who are trying to develop technologies

• Interface neuroscience to the broader life science community• Take advantage of emerging conventions in search and in

building web communities

Page 10: The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San

Registering a Resource to NIF

Level 1NIF Registry: high level descriptions from NIF vocabularies supplied by human curators

Level 2Access to deeper content; mechanisms for query and discovery; DISCO protocol

Level 3Direct query of web accessible databaseAutomated registrationMapping of database content to NIF vocabulary by human

Page 11: The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San

The NIF Registry

• Very, very simple model

• Annotated with NIF vocabularies• Resource type• Organism

• Reviewed by NIF curators

Page 12: The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San

Level 2: Updates and deeper integration• DISCO involves a collection of files that reside on each participating

resource. These files store information describing: - attributes of the resource, e.g., description, contact person, content of the resource, etc. -> updates NIF registry - how to implement DISCO capabilities for the resource

• These files are maintained locally by the resource developers and are “harvested” by the central DISCO server.

• In this way, central NIF capabilities can be updated automatically as resources evolve over time.

• The developers of each resource choose which DISCO capabilities their resource will utilize

Luis Marenco, MD, Rixin Wang, PhD, Perry L. Miller, MD, PhD, Gordon Shepherd, MD, DPhilYale University School of Medicine

Page 13: The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San

DISCO Level 2 Interoperation

• Level 2 interoperation is designed for resources that have only Web interfaces (no database API).

• Different resources require different approaches to achieve Level 2 interoperation. Examples are:

1. CRCNS - requires metadata tagging of Web pages

2. DrugBank - requires directed traversal of Web pages to extract data into a NIF data repository

3. GeneNetwork - requires Web-based queries to achieve “relational-like” views using “wrappers”

Page 14: The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San

DrugBank Example

The DrugBank Web interface showing data about a specific drug (Phentoin).

Page 15: The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San

DrugBank Example (continued)

This DISCO Interoperation file specifies how to extract data from the DrugBank Web interface automatically.

Page 16: The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San

DrugBank Example (continued)

A NIF user views data retrieved from DrugBank in response to a query in a transparent, integrated fashion.

Page 17: The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San

Level 3• Deep query of federated databases with

programmatic interface• Register schema with NIF

– Expose views of database– Map vocabulary to NIFSTD

• Currently works with relational and XML databases– RDF capability planned for NIF 2.5 (April 2010)

• Works with NIF registry: databases also annotated according to data type and biological area

Page 18: The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San

Integrated views and gene search

Page 19: The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San

Is GRM1 in cerebral cortex?

• NIF system allows easy search over multiple sources of information• Well known difficulties in search

• Inconsistent and sparse annotation of scientific data• Many different names for the same thing• No standards for data exchange or annotation at the semantic level

– Lack of standards in data annotation require a lot of human investment in reconciling information from different sources

Allen Brain Atlas

MGD

Gensat

Page 20: The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San

Cerebral CortexAtlas Children Parent

Genepaint Neocortex, Olfactory cortex (Olfactory bulb; piriform cortex), hippocampus

Telencephalon

ABA Cortical plate, Olfactory areas, Hippocampal Formation

Cerebrum

MBAT (cortex) Hippocampus, Olfactory, Frontal, Perirhinal cortex, entorhinal cortex

Forebrain

MBL Doesn’t appear

GENSAT Not defined Telencephalon

BrainInfo frontal lobe, insula, temporal lobe, limbic lobe, occipital lobe

Telencephalon

BrainmapsEntorhinal, insular, 6, 8, 4, A SII 17, Prp, SI

Telencephalon

Page 21: The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San

Modular ontologies for neuroscience

NIF covers multiple structural scales and domains of relevance to neuroscience Incorporated existing ontologies where possible; extending them for neuroscience where necessary Normalized under the Basic Formal Ontology: an upper ontology used by the OBO Foundry Based on BIRNLex: Neuroscientists didn’t like too many choices Cross-domain relationships are being built in separate files Encoded in OWL-DL, but also maintained in a Wiki form, a relational database form and any other way it is needed

NIFSTD

NS FunctionMolecule InvestigationSubcellular Anatomy

Macromolecule Gene

Molecule Descriptors

Techniques

Reagent Protocols

Cell

Instruments

Bill Bug

NS Dysfunction QualityMacroscopicAnatomyOrganism

Resource

Page 22: The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San

How are ontologies used?

• Search: query expansion– Synonyms– Related classes– “concept based queries”

• Annotation:– Resource categorization– Entity mapping

• Ranking of results– NIF Registry; NIF Web

Page 23: The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San

Concept-based search: Entity mapping

Brodmann area 3 Brodmann.3

Synonyms and explicit mapping of database content help smooth over terminology differences and custom terminologies

Page 24: The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San

Concept-based query: GABAergic neuron

• Simple search will not return examples of GABAergic neurons unless the data are explicitly tagged as such

• Too many possible classifications to get them all by keywords

• Classes are logically defined in NIF ontology• i.e., a GABAergic neuron is any

neuron that uses GABA as a neurotransmitter

Page 25: The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San
Page 26: The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San

Reclassification based on logical definitions: GABA neuron is any member of class neuron that has neurotransmitter GABA

NIF Bridge File• NIF Cell• NIF Molecule

• Define a set of properties that relate neuron classes to molecule classes, e.g.,

• Neuron• Has

neurotransmitter

• Purkinje cell is a Neuron

• Purkinje cell has neurotransmitter GABA

Page 27: The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San

NIF Bridge Files: Defining classes to enhance neuroscience search

• Neuron by neurotransmitter• Neuron by brain region• Neuron by circuit role• Neuron by morphology

• Molecule by role• Neurotransmitter• Drug• Drug of abuse

Page 28: The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San

NIF 2.5: Linking NIF entities

Page 29: The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San

NIF Cards: Linked data

Page 30: The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San

NIF 3.0+: NIF Annotation Standards

• Tying keyword search to quantitative definitions

• NIF is developing mappings between “data types” (e.g., age) and NIF annotation standards, e.g., adult

• Query string: adult mouse– Translation: Adult mouse = >/= 40 days postnatal

Page 31: The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San

Building NIF ontologies: Balancing act• Different schools of thought as to how to build vocabularies and

ontologies• NIF is trying to navigate these waters, keeping in mind:

– NIF is for both humans and machines– Our primary concern is data– We have to meet the needs of the community– We have a budget and deadlines

• Building ontologies is difficult even for limited domains, never mind all of neuroscience, but we’ve learned a few things– Reuse what’s there: trying to re-use URI’s rather than map when possible– Make what you do reusable: adopt best practices where feasible

• Numerical identifiers, unique labels, single asserted simple hierarchies– Engage the community– Avoid “religious” wars: separate the science from the informatics– Start simple and add more complexity

• Create modular building blocks from which other things can be built

Page 32: The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San

Ontologies, etc

Mike Bergman

Page 33: The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San

What we’ve learned• Strategy: Create modular building blocks that can be knit

into many things– Step 1: Build core lexicon (NeuroLex)

• Classes and their definitions• Simple single inheritance and non-controversial hierarchies• Each module covers only a single domain• Understandable by an average human

– Step 2: NIFSTD: standardize modules under same upper ontology• OBO compliant in OWL

– Step 3: Create intra-domain and more useful hierarchies using properties and restrictions

– Brain partonomy

– Step 4: Bridge two or more domains using a standard set of relations– Neuron to brain region– Neuron to molecule, e.g., GABAergic neuron

Page 34: The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San

Neurolex• More human centric• More stable class structure

– Ontologies take time; many versions are retired-not good for information systems that are using identifiers

• Synonyms and abbreviations were essential for users• Can’t annotate if they can’t find it• Can’t use it for search if they can’t find it• Facilitates semi-automated mapping

• Contains subsets of ontologies that are useful to neuroscientists– e.g., only classes in Chebi that neuroscientists use

• Wanted the community to be able to see it and use it– Simple understandable hierarchies

• Removed the independent continuants, entity etc• Used labels that humans could understand

– Even if they were plural (meninges vs meninx)

Page 35: The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San

Maintaining multiple versions

• NIF maintains the NIF vocabularies in different forms for different purposes– Neurolex Wiki: Lexicon for community review and

comment– NIFSTD: set of modular OWL files normalized

under BFO and available for download– NCBO Bioportal for visibility and mapping services– Ontoquest: NIF’s ontology server

• Relational store customized for OWL ontologies• Materialized inferred hierarchies for more efficient

queries

Page 36: The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San

NeuroLex Wiki

http://neurolex.org Stephen Larson

“The human interface”Semantic media wiki

Page 37: The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San

NIF Architecture

Gupta et al., Neuroinformatics, 2008 Sep;6(3):205-17

Page 38: The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San

Summary

• NIF has tried to adopt a flexible, practical approach to assembling, extending and using community ontologies – We use a combination of search strategies: string based, lexicon-based,

ontology-based– We believe in modularity– We believe in starting simple and adding complexity– We believe in single asserted hierarchies and multiple inferred hierarchies– We believe in balancing practicality and rigor

• NIF is working through the International Neuroinformatics Coordinating Facility (INCF) to engage the community to help build out the Neurolex and start adding additional relations– Neuron registry task force– Structural lexicon

Page 39: The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San

Where do we go from here?• NIF 2.5

– Automatic expansion of logically defined terms

• More services– NIF vocabulary

services are available now

• More content• More, more, more NIF Blog

NIF is learning lessons in practical data integration

Page 40: The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San

Musings from the NIF…• No single approach, technology, philosophy, tool,

platform will solve everything• The value of making resources discoverable is not

appreciated• Developing resources (tools, databases, data) that are

interoperable at this point is an act of will• Decisions can be made at the outset that will make it easier or harder to

integrate• We build resources for ourselves and our constituents, not

automated agents• We get mad when commercial providers don’t make their products

interoperable• Many times the choice of terminology is based on expediency or who

taught you biology rather than deep philosophical differences

• Ontologies, terminologies, lexicons, thesauri• Developing a semantic framework on top of unstable resources is difficult• Need to adopt a flexible approach

Page 41: The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San

NIF Evolution

V1.0: NIFNIF 1.5+++++

Then Now Later