26
Community Standards and Tools for Biodiversity Science NIEHS Workshop for the Development of a Framework for Environmental Health Science Language September 15, 2014 Ramona Walls iPlant Collaborative [email protected]

Community Standards and Tools for Biodiversity Science at NIEHD

Embed Size (px)

Citation preview

Community Standards and Tools for Biodiversity Science

NIEHS Workshop for the Development of a Framework for Environmental Health Science Language

September 15, 2014Ramona Walls

iPlant [email protected]

The biodiversity standards landscape

ECOLOGY

field surveysobservationsenvironments

MUSEUM COLLECTIONS

specimenschecklists

GENOMICS

moleculesspecimens

The biodiversity standards landscape

ECOLOGY

ISO/OGCO&M

OBO-E

MUSEUM COLLECTIONS

DwC/TDWGGBIF

GENOMICS

GSCINSDC

Changing landscape

ECOLOGY

GENOMICS

MUSEUM COLLECTIONS

Overview of existing standards

• Darwin Core (DwC)

• Minimum Information for any (x) Sequence (MIxS)

• Extensible Observation Ontology (OBO-E) and the Observation and Measurements (O&M)standard

Darwin Core

• Positives:

– widely used

– comprehensive

– available as RDF

• Negatives:

– ambiguous definitions

– no logical structures

Minimum Information for any (x) Sequence (MIxS)

MIxS

• Positives:

– easy to use

• truly minimal

• spreadsheet is familiar

– available as RDF

• Negatives:

– no logical structures

– minimum not always sufficient for re-use

Extensible Ontology for Observations (OBO-E) is an ontology that is consistent with the Observation

and Measurement (O&M) schema

Madin et al. 2007 Ecol. Informatics doi: 10.1016/j.ecoinf.2007.05.004

OBO-E:

O&M:

OBO-E with extensions for modeling data on crabs that live on corals measured along a transect

Madin et al. 2007 Ecol. Informatics doi: 10.1016/j.ecoinf.2007.05.004

OBO-E

• Positives:

– consistent with widely used standard (O&M)

– logical structure

– extensible to many kinds of observations

• Negatives:

– not easily accessible to many scientists

– not yet widely used or tested

– not yet aligned with other biological ontologies

Understanding GxPxE requires us to work at the intersection of disciplines

ECOLOGY

GENOMICS

MUSEUM COLLECTIONS

The Biological Collections Ontology (BCO) aims to integrate biodiversity data across

sources and sub-disciplines

Moorea Biocode

bioinventory event

Museum

specimens

Tissue sample at

Smithsonian Institution

Gut sample Metagenomic sequences

at CAMERA portal

Genbank

sequence

Digital image stored

on Morphbank

identification

John DeckJohn WieczoreckRobert Guralnick+ many more!

Challenge: getting existing datasets into a form where we can query them the way we want

NEON soil sample data mapped to BCO

Scaling up: bringing the metagenomicdata into the iPlant Data Commons

Bonnie HurwitzKen Youens-ClarkiPlant staff

Scaling up: bringing the metagenomicdata into the iPlant Data Commons

Bonnie HurwitzKen Youens-ClarkiPlant staff

Step 1: validate workflow with existing data

Step 2: production workflow with “traditional” technology

alternative: leverage iRODS technology

iPlant Data Commons: workflows for the entire data lifecycle

project creation

iPlant Data Commons: workflows for the entire data lifecycle

specimencollection

project creation

iPlant Data Commons: workflows for the entire data lifecycle

specimencollection

analysis

project creation

iPlant Data Commons: workflows for the entire data lifecycle

specimencollection

analysis

project creation publication

Acknowledgements