16
1/21/10 1 The Neuroscience Information Framework Practical experiences in using and building community ontologies Maryann Martone, Ph. D. University of California, San Diego

The Neuroscience Information Framework - DO-Wikido-wiki.nubic.northwestern.edu/images/b/b3/Martone_NIF_DO2.pdfThe Neuroscience Information Framework ... Supported by NIH Blueprint

Embed Size (px)

Citation preview

Page 1: The Neuroscience Information Framework - DO-Wikido-wiki.nubic.northwestern.edu/images/b/b3/Martone_NIF_DO2.pdfThe Neuroscience Information Framework ... Supported by NIH Blueprint

1/21/10

1

The Neuroscience Information Framework

Practical experiences in using and building community ontologies

Maryann Martone, Ph. D. University of California, San Diego

Page 2: The Neuroscience Information Framework - DO-Wikido-wiki.nubic.northwestern.edu/images/b/b3/Martone_NIF_DO2.pdfThe Neuroscience Information Framework ... Supported by NIH Blueprint

1/21/10

2

The Neuroscience Information Framework: Discovery and utilization of web-based resources for neuroscience

http://neuinfo.org

UCSD, Yale, Cal Tech, George Mason, Washington Univ

Supported by NIH Blueprint

  A portal for finding and using neuroscience resources

  A consistent framework for describing resources

  Provides simultaneous search of multiple types of information, organized by category

  Supported by an expansive ontology for neuroscience

  Utilizes advanced technologies to search the “hidden

Where do I find…

•  Data •  Software tools •  Materials •  Services •  Training •  Jobs •  Funding

opportunities

•  Websites •  Databases •  Catalogs •  Literature •  Supplementary

material •  Information portals

...And how many are there?

Page 3: The Neuroscience Information Framework - DO-Wikido-wiki.nubic.northwestern.edu/images/b/b3/Martone_NIF_DO2.pdfThe Neuroscience Information Framework ... Supported by NIH Blueprint

1/21/10

3

NIF in action

Query expansion: Synonyms and related

concepts Boolean queries

Data sources categorized by “data type” and level of nervous

system

Simplified views of complex data

sources

Tutorials for using full resource when getting there from

NIF

Hippocampus OR “Cornu Ammonis” OR “Ammon’s horn”

Page 4: The Neuroscience Information Framework - DO-Wikido-wiki.nubic.northwestern.edu/images/b/b3/Martone_NIF_DO2.pdfThe Neuroscience Information Framework ... Supported by NIH Blueprint

1/21/10

4

NIF searches across multiple sources of information

• NIF data federation • Independent databases registered with NIF or through web services

• NIF registry: catalog • NIF Web: custom web index (work in progress) • NIF literature: neuroscience-centered literature corpus (Textpresso)

Guiding principles of NIF •  Builds heavily on existing technologies (open source

tools) •  Information resources come in many sizes and flavors •  Framework has to work with resources as they are, not

as we wish them to be –  Federated system; resources will be independently maintained –  Very few use standard terminology or map to ontologies

•  No single strategy will work for the current diversity of neuroscience resources

•  Trying to design the framework so it will be as broadly applicable as possible to those who are trying to develop technologies

•  Interface neuroscience to the broader life science community

•  Take advantage of emerging conventions in search and

Page 5: The Neuroscience Information Framework - DO-Wikido-wiki.nubic.northwestern.edu/images/b/b3/Martone_NIF_DO2.pdfThe Neuroscience Information Framework ... Supported by NIH Blueprint

1/21/10

5

Registering a Resource to NIF

Level 1 NIF Registry: high level descriptions from NIF vocabularies supplied by human curators

Level 2 Access to deeper content; mechanisms for query and discovery; DISCO protocol

Level 3 Direct query of web accessible database Automated registration Mapping of database content to NIF vocabulary by human

The NIF Registry

• Very, very simple model • Annotated with NIF vocabularies

• Resource type • Organism

• Reviewed by NIF curators

Page 6: The Neuroscience Information Framework - DO-Wikido-wiki.nubic.northwestern.edu/images/b/b3/Martone_NIF_DO2.pdfThe Neuroscience Information Framework ... Supported by NIH Blueprint

1/21/10

6

Modular ontologies for neuroscience

  NIF covers multiple structural scales and domains of relevance to neuroscience   Incorporated existing ontologies where possible; extending them for neuroscience where

necessary   Having community ontologies allowed NIF to implement terminology services very quickly   Normalized under the Basic Formal Ontology: an upper ontology used by the OBO Foundry   Based on BIRNLex: Neuroscientists didn’t like too many choices

NIFSTD

NS Function Molecule Investigation Subcellular Anatomy

Macromolecule Gene

Molecule Descriptors

Techniques

Reagent Protocols

Cell

Instruments

Bill Bug

NS Dysfunction Quality Macroscopic

Anatomy Organism

Resource

How are ontologies used?

•  Search: query expansion – Synonyms – Related classes –  “concept based queries”

•  Annotation: – Resource categorization – Entity mapping

•  Ranking of results – NIF Registry; NIF Web

Page 7: The Neuroscience Information Framework - DO-Wikido-wiki.nubic.northwestern.edu/images/b/b3/Martone_NIF_DO2.pdfThe Neuroscience Information Framework ... Supported by NIH Blueprint

1/21/10

7

Concept-based search: Entity mapping

Brodmann area 3 Brodmann.3

• NIF indexes over 18 million records; most are not annotated • Synonyms and explicit mapping of database content help smooth over terminology differences and custom terminologies • NIF TIS: Ontology mappings

Concept-based query: GABAergic neuron

• Not what I say; what I mean

• Simple search will not return examples of GABAergic neurons unless the data are explicitly tagged as such

• Too many possible classifications to get them all by keywords

• Classes are logically defined in NIF ontology

• i.e., a GABAergic neuron is any neuron that uses GABA as a neurotransmitter

Page 8: The Neuroscience Information Framework - DO-Wikido-wiki.nubic.northwestern.edu/images/b/b3/Martone_NIF_DO2.pdfThe Neuroscience Information Framework ... Supported by NIH Blueprint

1/21/10

8

Building NIF ontologies: Balancing act

•  Different schools of thought as to how to build vocabularies and ontologies

•  NIF is trying to navigate these waters, keeping in mind: –  NIF is for both humans and machines –  Our primary concern is data –  We have to meet the needs of the community –  We have a budget and deadlines

•  Building ontologies is difficult even for limited domains, never mind all of neuroscience, but we’ve learned a few things –  Reuse what’s there: trying to re-use URI’s rather than map when

possible –  Make what you do reusable: adopt best practices where feasible

•  Numerical identifiers, unique labels, single asserted simple hierarchies –  Engage the community –  Avoid “religious” wars: separate the science from the informatics –  Start simple and add more complexity to support reuse by other

communities and to confront the realities of ontology construction

Adding complexity –  Step 1: Build core lexicon (NeuroLex)

•  Classes and their definitions –  Import from existing ontologies (but only those classes we need)

•  Simple single inheritance and non-controversial hierarchies •  Each module covers only a single domain •  Understandable by an average human

–  Step 2: NIFSTD: standardize modules under same upper ontology •  OBO compliant in OWL •  Feed classes to ontology creators

–  Step 3: Create intra-domain and more useful hierarchies using properties and restrictions

–  Brain partonomy

–  Step 4: Bridge two or more domains using a standard set of relations –  Contained In a separate file called a bridge file –  Neuron to brain region –  Neuron to molecule, e.g., GABAergic neuron

Page 9: The Neuroscience Information Framework - DO-Wikido-wiki.nubic.northwestern.edu/images/b/b3/Martone_NIF_DO2.pdfThe Neuroscience Information Framework ... Supported by NIH Blueprint

1/21/10

9

Maintaining multiple versions

•  NIF maintains the NIF vocabularies in different forms for different purposes – Neurolex Wiki: Lexicon for community review

and comment – NIFSTD: set of modular OWL files normalized

under BFO and available for download – NCBO Bioportal for visibility and mapping

services – Ontoquest: NIF’s ontology server

•  Relational store customized for OWL ontologies •  Materialized inferred hierarchies for more efficient

queries

NeuroLex Wiki

http://neurolex.org Stephen Larson

“The human interface” Semantic media wiki

Page 10: The Neuroscience Information Framework - DO-Wikido-wiki.nubic.northwestern.edu/images/b/b3/Martone_NIF_DO2.pdfThe Neuroscience Information Framework ... Supported by NIH Blueprint

1/21/10

10

Neurolex: “OBO lite” •  More human centric •  More stable class structure

–  Ontologies take time; many versions are retired-not good for information systems that are using identifiers

•  Synonyms and abbreviations were essential for users •  Can’t annotate if they can’t find it •  Can’t use it for search if they can’t find it •  Used terms that humans could understand

–  Even if they were plural (meninges vs meninx) •  Facilitates semi-automated mapping

•  Contains subsets of ontologies that are useful to neuroscientists –  e.g., only classes in Chebi that neuroscientists use

•  Wanted the community to be able to see it and use it –  Simple understandable hierarchies

•  Removed the independent continuants, entity etc •  Used them for guidance but we often lack the time and expertise to figure out

distinctions

Page 11: The Neuroscience Information Framework - DO-Wikido-wiki.nubic.northwestern.edu/images/b/b3/Martone_NIF_DO2.pdfThe Neuroscience Information Framework ... Supported by NIH Blueprint

1/21/10

11

NIF Resource Descriptors •  Working with NCBC

(Biomedical Resource Ontology) and NITRC to come up with single resource ontology and information model

•  Reconciling current versions; moving forward jointly

•  Same classes, different views

Peter Lyster, Csongor Nyulas, David Kennedy, Maryann Martone, Anita Bandrowski

Building from the NIF •  NIFSTD module: NIF investigation

•  Based on BFO and OBI (now apparently, IAO) •  Contains objects that are related to resource

types, e.g., software application; instrument

•  NIFSTD module: NIF Resource – Resource categories, e.g., software

resource – Objects from NIFSTD reclassified under

resource categories using very simple logical restrictions

•  NIF resource browser interface – Assigns alternate labels that are easier for

users to understand

Page 12: The Neuroscience Information Framework - DO-Wikido-wiki.nubic.northwestern.edu/images/b/b3/Martone_NIF_DO2.pdfThe Neuroscience Information Framework ... Supported by NIH Blueprint

1/21/10

12

Structured knowledge source

Display label = Database or

ontology

NIFSTD: Investigation

NIFSTD: Resource (defined by restrictions in terms of NIFSTD Investigation)

NIF portal user interface

One area where NIF is actively building an ontology: Neurons and glia Classify them based on properties: many and varied

Page 13: The Neuroscience Information Framework - DO-Wikido-wiki.nubic.northwestern.edu/images/b/b3/Martone_NIF_DO2.pdfThe Neuroscience Information Framework ... Supported by NIH Blueprint

1/21/10

13

Reclassification based on logical definitions: GABA neuron is any member of class neuron that has neurotransmitter GABA; Hippocampal neuron is any member of the class neuron whose soma is part of any part of the hippocampal formation

NIFSTD Bridge File

• NIF cell • NIF molecule • NIF anatomy

• Define a set of properties that relate neuron classes to molecule classes, e.g.,

• Neuron • Has neurotransmitter

• Purkinje cell is a Neuron • Purkinje cell has neurotransmitter GABA

• Neuron has part soma • Soma is part of brain region

NIF Cell Ontology

Working with community ontologies: NIF Molecule

• Intent: CHEBI, PRO • CHEBI too large and undergoing reorganization; PRO not ready and didn’t have any molecules that neuroscientists cared about, e.g., ion channels

• Imported IUPHAR classification of ion channels and G protein coupled receptors

• Gave to PRO but they gave new identifiers • Neuroscientists less concerned about structure than function, roles, dispositions

• e.g., Potassium channels, neuropeptides, drugs of abuse • Can’t get those from current hierarchies • We can’t really tell the difference • Relations ontology evolving slowly • Solution: Take ChEBI, PRO, NIF and add relations in a bridge file to allow us to do what we need for our constituents

• NIF Molecule: • Classes imported from Chebi as needed (Mereot approach) • Created molecule roles • Assign: Has role relationship in NIF (simplification of a complex expression)

Page 14: The Neuroscience Information Framework - DO-Wikido-wiki.nubic.northwestern.edu/images/b/b3/Martone_NIF_DO2.pdfThe Neuroscience Information Framework ... Supported by NIH Blueprint

1/21/10

14

NIF Dysfunction

• Did not import DO in previous form as it was being redone • Created NIF dysfunction from nervous system diseases from MeSH, NINDS terminology and OMIM • ~350 classes • Primarily classified according to region affected (eye vs nervous system) and then symptoms • Have not re-organized according to latest BFO; waiting for guidance on this • Do need conditions beyond diseases though

• Mental illness • Addiction • Obesity

Disease phenotype ontologies

• Groups (us included) are building phenotype ontologies from the basic NIF modules • “Application ontologies”

Phenotype Organism

Disease

Stage

Entity

Protocol Quality

Information Artifact

has part

is bearer of

inheres in

participates in

derives into derives into

is bearer of

has stage

Sarah Maynard, Chris Mungal, Suzie Lewis

Page 15: The Neuroscience Information Framework - DO-Wikido-wiki.nubic.northwestern.edu/images/b/b3/Martone_NIF_DO2.pdfThe Neuroscience Information Framework ... Supported by NIH Blueprint

1/21/10

15

The search for animal models starts here   A directory of available disease models   A single search across all known animal model

databases   “…reference materials, information about the resources,

and information about the animal models themselves”   Start with mouse, zebrafish, and non-human primates   Expand “to other species and along other dimensions”   Build on existing ontologies such as NeuroLex"

Summary •  NIF has tried to adopt a flexible, practical approach to assembling, extending and

using community ontologies –  We use a combination of search strategies: string based, lexicon-based, ontology-based –  Some strategies are based on practicalities, e.g., limitations with working with current

tools, but which we hope will support orderly evolution •  Lots of legacy slop

–  We believe in modularity –  We believe in starting simple and adding complexity so that we can all work off the same

base and add customization as needed while ontologists and scientists are grappling with the nature of reality

–  We believe in single asserted hierarchies and multiple inferred hierarchies –  Using upper ontologies enforces clarity of thinking and can serve as useful guides to

dedicated domain scientists but some issues require more thought than we can give –  We are establishing processes for working with ontology providers but it has to be two

ways

•  Stable identifiers are key –  We have added a lot of value to the base ontologies, some of which may not pass

muster but which is necessary for us to satisfy our constituents and funders •  Ontologies have to satisfy two communities: naïve users and ontologists; not

easy to do both with current tools •  Lexicons, thesauri, controlled vocabularies, ontologies: all have their place.

Would like them to work together.

Page 16: The Neuroscience Information Framework - DO-Wikido-wiki.nubic.northwestern.edu/images/b/b3/Martone_NIF_DO2.pdfThe Neuroscience Information Framework ... Supported by NIH Blueprint

1/21/10

16

NIF Evolution

V1.0: NIF NIF 1.5+++++

Then Now Later