30
Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape of biodiversity data publishing John Wieczorek ([email protected]) Information Architect Museum of Vertebrate Zoology, UC Berkeley Buenos Aires (Argentina) 28 September 2011

Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape

Embed Size (px)

Citation preview

Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition

How Darwin Core Archives have changed the landscape of biodiversity data publishing

John Wieczorek ([email protected])Information ArchitectMuseum of Vertebrate Zoology, UC Berkeley

Buenos Aires (Argentina)28 September 2011

Background: Data Exchange

ABCD (TDWG Standard)• > 1200 concepts• XML• Shared via BioCase, Tapir

Darwin Core (pre-standard v. 1.2, 47 versions)• 48 concepts, specimens• XML• Shared via by DiGIR

Darwin Core (pre-standard v. 1.4)• 46 concepts (plus extensions), specimens• XML• Shared via Tapir

Darwin Core (TDWG Standard)• 172 concepts (156 in Simple Darwin Core), biodiversity data• CSV, XML, RDF, JSON, …• Shared via Text files, Tapir, Darwin Core Archive…

Darwin Core Archive

PrimaryBiodiversity

Data

TaxonomicData

Metadata

http://www.someplace.org/data.zip

Darwin Core ArchiveComplete Package

• Standard Darwin Core terms in a single, self-contained dataset

• Taxon records or Occurrence Records

• Data set metadata in EML

• Simple format (text files)

• Efficient harvesting (single file)

• Efficient storage (compressed)

• Easy access (no special software required)

• Extensible (related files in one archive)

Darwin Core Archive:Benefits

Preferred format for publishing data in the GBIF network

Darwin Core Archive:Anatomy

Archives always have a metadata file as EML

Ecological Metadata Language (EML)

• Title and Abstract• Citation and Attribution• Contact and Authors• Geographic Scope• Sampling Methods• Bibliography• and more…

For describing data sets – even unpublished ones

Darwin Core Archive:Anatomy

Archives always have a core data file as text

Core data file types

Records based on taxa – one species per row

Records based on species occurrences – one per row

OR

Darwin Core Archive:Anatomy

Archives always have a core data file as text

Core contains a “core ID” column, unique for every record in the file

Darwin Core Archive:Anatomy

Columns are matched to Darwin Core terms

Darwin Core Archive:Anatomy

Columns that do not match to a Darwin Core term

may be included, but are ignored

“Wingspan” is not a Darwin Core term

Darwin Core Archive:Anatomy

1) Rename columns in text file

Two ways to match columns to Darwin Core terms

Darwin Core Archive:Anatomy

2) Match columns to terms in a separate meta.xml file

Two ways to match columns to Darwin Core terms

Darwin Core Archive:Anatomy

meta.xml matches the columns in the core data file (species.txt)

More on how to make the meta.xml file later…

Darwin Core Archive:Anatomy

Archives can include extension filesSpecies.txt

Common_names.txt

Extensions allow multiple records to be linked to a core record.

Extensions link to the core through the core ID

Darwin Core Archive:Anatomy

GBIF hosts extension definitions

http://rs.gbif.org/extension/

Multiple extensions files can be linked to the core

Darwin Core Archive:Anatomy

All files are stored in a single folder

Darwin Core Archive:Anatomy

The folder is zipped.

This is a Darwin Core Archive• Data files• Column matching file• Data set documentation

Darwin Core Archive:Anatomy

http://www.organisation.org /my_data.zip

Archives on a web server can be accessed by a URL. Share this URL to “publish” your data!

Darwin Core Archive:Publishing

Darwin Core Archive:Publishing Options

GBIF Spreadsheet Templates

Integrated Publishing Toolkit

Data Hosting Centers

Darwin Core Mapping Assistant

Metafile

http://tools.gbif.org/dwca-assistant/

Darwin Core Mapping Assistant

• GBIF Darwin Core Archive Spreadsheet Templates:• data in a spreadsheet already• simple archive authoring

• IPT:• creating/managing archives for multiple data sets• managing archives for multiple organisations• metadata as GBIF Metadata Profile of EML

• Make Your Own:• automating archive generation• customisation

• Hosting center:• economy of scale• Infrastructure and support

• Combinations…

Darwin Core Archive:Publishing Options

Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition

How Darwin Core Archives have changed the landscape of biodiversity data publishing

Presenter (email)RoleOrganization

Buenos Aires (Argentina)28 September 2011