55
Online tools and standards for Biodiversity data in the Semantic Web Dr Dimitris Koureas Biodiversity Informatics Group | Department of Life Sc The Natural History Museum London

Online tools and standards for Biodiversity data in the Semantic Web

  • Upload
    vian

  • View
    38

  • Download
    0

Embed Size (px)

DESCRIPTION

Online tools and standards for Biodiversity data in the Semantic Web. Dr Dimitris Koureas Biodiversity Informatics Group | Department of Life Sciences The Natural History Museum London. What is the semantic web?. http://…. http://…. - PowerPoint PPT Presentation

Citation preview

Page 1: Online tools and standards for Biodiversity data in the Semantic Web

Online tools and standards for Biodiversity data in the Semantic Web

Dr Dimitris KoureasBiodiversity Informatics Group | Department of Life SciencesThe Natural History Museum London

Page 2: Online tools and standards for Biodiversity data in the Semantic Web

http://… http://…

What is the semantic web?

Slide adjusted from Page R. presentation in pro-iBiosphere

Page 3: Online tools and standards for Biodiversity data in the Semantic Web

http://… http://…

link

,

What is the semantic web?

Slide adjusted from Page R. presentation in pro-iBiosphere

Page 4: Online tools and standards for Biodiversity data in the Semantic Web

http://… http://…

http://…

What is the semantic web?

Slide adjusted from Page R. presentation in pro-iBiosphere

Page 5: Online tools and standards for Biodiversity data in the Semantic Web

http://… http://…

http://…

is a author of

person

Fred

book

What is the semantic web?

Slide adjusted from Page R. presentation in pro-iBiosphere

Page 6: Online tools and standards for Biodiversity data in the Semantic Web

The Semantic web:

“The future of the web…and always will be” – Peter Norvig (Google)

What is the semantic web?

Slide adjusted from Page R. presentation in pro-iBiosphere

Page 7: Online tools and standards for Biodiversity data in the Semantic Web

Biodiversity informatics

The study of the transformation and communication of information in Life and Earth sciences

provides the means (generating and enhancing the necessary infrastructure)

Page 8: Online tools and standards for Biodiversity data in the Semantic Web

Research

vs

InfrastructureSlide adapted from Patterson D. 2013, Tempe, Arizona

Page 9: Online tools and standards for Biodiversity data in the Semantic Web

vs

Infrastructure

Discovery Ephemeral Individualistic Massive redundancy Optional Risk taking

Slide adapted from Patterson D. 2013, Tempe, Arizona

Research

Page 10: Online tools and standards for Biodiversity data in the Semantic Web

vs

Infrastructure

Discovery Ephemeral Individualistic Massive redundancy Optional Risk taking

Implementation Communal / agreed Essential Persistent Robust & reliable Adaptable

Slide adapted from Patterson D. 2013, Tempe, Arizona

Research

Page 11: Online tools and standards for Biodiversity data in the Semantic Web

What are the current challenges in Biodiversity informatics?

Page 12: Online tools and standards for Biodiversity data in the Semantic Web

Publications based on countless specimens, images, maps,

keys and datasets

Current taxonomic data production

Typically generated by small communities for “local” research projects

Figure from Costello M.J et al, 2013doi: 10.1126/science.1230318

Page 13: Online tools and standards for Biodiversity data in the Semantic Web

• 15-20k new spp. described annually (2M total)1

• 30k nomenclatural acts (12M total) 1

• 20k phylogenies (750k total)2

• 31k taxa sequenced (360k taxa total)3

• 800k BioMed papers (40M total pp. of taxonomy) 4

• Countless specimens, images, maps, keys and datasets

Our current taxonomic data production

Figures from 1) Zhang, Zootaxa 2011 4, 1-4; 2) Web-of-Science; 3) Genbank and 4) PubMed.

1.8 M described spp. (17M names)300M pages (over last 250 years)1.5-3B specimens

Page 14: Online tools and standards for Biodiversity data in the Semantic Web

Estimates of

7.5 million species

still undescribed1

1How Many Species Are There on Earth and in the Ocean? Mora C et al. doi:10.1371/journal.pbio.1001127

Now imagine that…

Page 15: Online tools and standards for Biodiversity data in the Semantic Web

Biodiversity informatics landscape

Key problems• Landscape is complex, fragmented & hard to navigate• Many audiences (policy makers, scientists, amateurs, citizen scientists)• Many scales (global solutions to local problems)

Figure adapted from Peterson et al, Syst. & Biodiv. 2010doi: 10.1080/14772001003739369

Page 16: Online tools and standards for Biodiversity data in the Semantic Web

Science is carried out “locally”• By local scientists• Being part of local infrastructures• Having local funders

Science is global• It needs global standards• Global workflows• Cooperation of global players

BUT

Page 17: Online tools and standards for Biodiversity data in the Semantic Web

Expected volume

of taxonomic and

biodiversity data

Need of extracting,

aggregating and linking

data on a global level

Page 18: Online tools and standards for Biodiversity data in the Semantic Web

Cyndy Parr, Rob Guralnick, Nico Cellinese and Rod Page. TREE doi:10.1016/j.tree.2011.11.001

This requires data, information & knowledge to be…

• Digital Not printed paper

• Openly accessible Not behind barriers (e.g. paywalls)

• Linked-up Not in silos

“Link together evolutionary data… by developing

analytical tools and proper documentation and then use this framework to conduct comparative analyses, studies of evolutionary process and biodiversity analyses”

To achieve this…

Page 19: Online tools and standards for Biodiversity data in the Semantic Web

Hour-glass motif for big data infrastructure

Data re-use

Data generation

Data pool

Slide adapted from Patterson D. 2013, Tempe, Arizona

Page 20: Online tools and standards for Biodiversity data in the Semantic Web

Big data world with re-use data

AggregationVisualization Analysis Manipulation

ModelsObservations Experiments Processed

Data re-use

Data generation

Data pool

Re-use Quality enhancement

Distribute Make discoverable and actionable Atomize Standardize (metadata, ontology) Use stable UUIDs to identify content Preserve Federate Register

Make accessible Normalize data Structure data Make data digital

Page 21: Online tools and standards for Biodiversity data in the Semantic Web

AggregationVisualization Analysis Manipulation

ModelsObservations Experiments Processed

Data re-use

Data generation

Data pool

Big data world with re-use data

Page 22: Online tools and standards for Biodiversity data in the Semantic Web

• Dynamically interconnected• Nodes with sub-discipline

specific responsibilities• Standard Exchange

formats• Using UUIDs to

identify content• Ontologies

Nodes are the essence of infrastructure

Nodes interconnected

Slide adapted from Patterson D. 2013, Tempe, Arizona

Page 23: Online tools and standards for Biodiversity data in the Semantic Web

But how many biodiversity informatics projects are out there?

Page 24: Online tools and standards for Biodiversity data in the Semantic Web

At least 679!

But how many biodiversity informatics projects are out there?

Sources: EDIT, TDWG & ViBRANT 2013

Categories:

Data Aggregator - a web site that collates data from a variety of sources (digital and hardcopy) and

presents it in one form

Data Indexer - a web site that provides lists or indexes of other sites that provide data

Data Provider - a web site that provides data directly from research or other studies

Data Standards - a web site that contributes to formulating or developing standards for data

Facilitator - a web site that facilitates the provision of data by other projects or web sites

Page 25: Online tools and standards for Biodiversity data in the Semantic Web

GBIF: Our global leader in occurrence data

Aggregators

Page 26: Online tools and standards for Biodiversity data in the Semantic Web

http://www.eu-nomen.eu/portal/EU-NOMEN - PESI

Aggregators

Page 27: Online tools and standards for Biodiversity data in the Semantic Web

Making taxonomy digital, open & linked

Aggregators

Page 28: Online tools and standards for Biodiversity data in the Semantic Web

Scratchpads are an integrated system to

Enter, Curate, Mark-up, Link and Publish data

taxonomic workflowin a single virtual environment

Page 29: Online tools and standards for Biodiversity data in the Semantic Web

A Scratchpad is a website that holds data for you and your community

The Scratchpads concept

Your data External data & services

Page 30: Online tools and standards for Biodiversity data in the Semantic Web

65,000 unique visitors/month

Per month unique visitors to Scratchpads sites

580 Scratchpads Communities

by 8,185 active registered users

covering 55,607 taxa

in 653,274 pages.

In total more than

1,300,000 visitors

Page 31: Online tools and standards for Biodiversity data in the Semantic Web

Researchers can assemble, test, and analyse their data records in BOLD before uploading them to: International Nucleotide Sequence Database Collaboration (DDBJ, ENA, GenBank)

BOLDBarcode of Life Data Systems

Facilitators

Page 32: Online tools and standards for Biodiversity data in the Semantic Web

Biodiversity literature openly available to the world as part of a global biodiversity community

Biodiversity Heritage LibraryBHL

http://www.biodiversitylibrary.org/

> 40 M pages of legacy literature

Providers

Page 33: Online tools and standards for Biodiversity data in the Semantic Web

Standard Exchange formats

Page 34: Online tools and standards for Biodiversity data in the Semantic Web

http://rs.tdwg.org/dwc/index.htmDarwin Core(DwC)

Primarily used as a specimen records metadata standard

Standard Exchange formats

Page 35: Online tools and standards for Biodiversity data in the Semantic Web

Access to Biological Collection Data(ABCD)

http://www.tdwg.org/standards/115/

highly detailed and aims to provide a complete set of data elements for natural history collection items

Standard Exchange formats

Page 36: Online tools and standards for Biodiversity data in the Semantic Web

Audubon Core Multimedia Resources Metadata Schema

http://www.tdwg.org/standards/638/

The Audubon Core metadata schema ("AC") is a representation-neutral metadata vocabulary for describing biodiversity-related multimedia resources and collections.

Standard Exchange formats

Page 37: Online tools and standards for Biodiversity data in the Semantic Web

http://tdwg.napier.ac.uk/index.php?pagename=HomePage

Taxonomic Concept Transfer Schema (TCS)

Mechanism to exchange data concerning the names of organisms

Standard Exchange formats

Page 38: Online tools and standards for Biodiversity data in the Semantic Web

Standards facilitate systems interoperability

Page 39: Online tools and standards for Biodiversity data in the Semantic Web

UPIDs to identify content

IdentifiersA key to findsomething in adatabase.

We need Unique Identifiers

Page 40: Online tools and standards for Biodiversity data in the Semantic Web

10.4289/0013-8797.115.1.75

We need Unique Identifiers

Page 41: Online tools and standards for Biodiversity data in the Semantic Web

http://hdl.handle.net/10.4289/0013-8797.115.1.75

http://dx.doi.org/10.4289/0013-8797.115.1.75

http://www.google.co.uk/search?q=10.4289/0013-8797.115.1.75

http://zoobank.org/10.4289/0013-8797.115.1.75

We need Unique Identifiers

Page 42: Online tools and standards for Biodiversity data in the Semantic Web

Can a taxonomic name be used as a UPID?

Is it Unique?Is it Persistent?Is it an Identifier?

Are taxonomic names enough for communication between Scientists? YES

Are taxonomic names enough for communication between machines? CAN BE IF

We need Unique Identifiers

Page 43: Online tools and standards for Biodiversity data in the Semantic Web

For example:

Page R., Brief Bioinform (2008) 9 (5): 345-354. doi: 10.1093/bib/bbn022

We need Unique Identifiers

Page 44: Online tools and standards for Biodiversity data in the Semantic Web

ONLY IF Name reconciliation

Patterson, D. J. et al. 2010. Names are key to the big new biology. TREE 25: 686-691 doi: 10.1016/j.tree.2010.09.004

We need Unique Identifiers

Page 45: Online tools and standards for Biodiversity data in the Semantic Web

The need for Controlled Vocabularies and Ontologies

Knowledge Organisation Systems

Google has done it:http://googleblog.blogspot.co.uk/2012/05/introducing-knowledge-graph-things-not.html

Ontologies

Plant anatomical and structural development Ontologyhttp://www.plantontology.org/

Page 46: Online tools and standards for Biodiversity data in the Semantic Web

Deans A. et al. Time to change how we describe biodiversity, Trends in Ecology & Evolution 2012doi:10.1016/j.tree.2011.11.007

Example of ontology usage

Page 47: Online tools and standards for Biodiversity data in the Semantic Web

Examples of integrated projects

http://protectedplanet.net

http://thymus.myspecies.info

Page 48: Online tools and standards for Biodiversity data in the Semantic Web

How are all this relevant to my work?

What should I take home?

Page 49: Online tools and standards for Biodiversity data in the Semantic Web

Repositories#bigdata

Providers

Data silos

Community

Page 50: Online tools and standards for Biodiversity data in the Semantic Web

The four nodes of data workflow

1. We collect and generate data

2. We curate, link and structure data

3. We analyse data

4. We publish data

Page 51: Online tools and standards for Biodiversity data in the Semantic Web

Data curation

Data analysis

Data publishing

The four nodes of data workflow

Data collection &generation

What are the

bottlenecks

in the workflow?

Page 52: Online tools and standards for Biodiversity data in the Semantic Web

Data curation

Data analysis

Data publishing

What we need is…

Data collection &generation

aseamless

workflow

Page 53: Online tools and standards for Biodiversity data in the Semantic Web

Old Joke:A drunk is crawling around a lamp post on his hands and knees.

A cop comes along …

Cop: What are you doing?

Drunk: Looking for my car keys.

Cop: Are you sure you dropped them here?

Drunk: No, I dropped them in the alley.

Cop: So why are you looking here?

Drunk: Because the light’s better.

Old Joke

Page 54: Online tools and standards for Biodiversity data in the Semantic Web

Science is a ‘light’s better’ endeavor in that research effort is

not directed at areas where the work is technically infeasible.

Research is directed where real, interpretable results may be

obtained.

We do, in fact, conduct research where the light’s better.

But, when the light changes, so does science.

With better illumination, we look in new areas.

We find new things…

Old Joke

Page 55: Online tools and standards for Biodiversity data in the Semantic Web

Addressing the challenges of biodiversity informatics

“…the field [of biodiversity informatics] appears to be growing ina void of overarching, motivating questions, effectively making it

a set of technologies in search of questions to address.”

Peterson et al, Syst. & Biodiv. 2010doi: 10.1080/14772001003739369