Upload
jody-ford
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition
How Darwin Core Archives have changed the landscape of biodiversity data publishing
John Wieczorek ([email protected])Information ArchitectMuseum of Vertebrate Zoology, UC Berkeley
Buenos Aires (Argentina)28 September 2011
Background: Data Exchange
ABCD (TDWG Standard)• > 1200 concepts• XML• Shared via BioCase, Tapir
Darwin Core (pre-standard v. 1.2, 47 versions)• 48 concepts, specimens• XML• Shared via by DiGIR
Darwin Core (pre-standard v. 1.4)• 46 concepts (plus extensions), specimens• XML• Shared via Tapir
Darwin Core (TDWG Standard)• 172 concepts (156 in Simple Darwin Core), biodiversity data• CSV, XML, RDF, JSON, …• Shared via Text files, Tapir, Darwin Core Archive…
Darwin Core Archive
PrimaryBiodiversity
Data
TaxonomicData
Metadata
http://www.someplace.org/data.zip
Darwin Core ArchiveComplete Package
• Standard Darwin Core terms in a single, self-contained dataset
• Taxon records or Occurrence Records
• Data set metadata in EML
• Simple format (text files)
• Efficient harvesting (single file)
• Efficient storage (compressed)
• Easy access (no special software required)
• Extensible (related files in one archive)
Darwin Core Archive:Benefits
Preferred format for publishing data in the GBIF network
Ecological Metadata Language (EML)
• Title and Abstract• Citation and Attribution• Contact and Authors• Geographic Scope• Sampling Methods• Bibliography• and more…
For describing data sets – even unpublished ones
Core data file types
Records based on taxa – one species per row
Records based on species occurrences – one per row
OR
Columns that do not match to a Darwin Core term
may be included, but are ignored
“Wingspan” is not a Darwin Core term
Darwin Core Archive:Anatomy
1) Rename columns in text file
Two ways to match columns to Darwin Core terms
Darwin Core Archive:Anatomy
2) Match columns to terms in a separate meta.xml file
Two ways to match columns to Darwin Core terms
Darwin Core Archive:Anatomy
meta.xml matches the columns in the core data file (species.txt)
More on how to make the meta.xml file later…
Darwin Core Archive:Anatomy
Archives can include extension filesSpecies.txt
Common_names.txt
Extensions allow multiple records to be linked to a core record.
Extensions link to the core through the core ID
Darwin Core Archive:Anatomy
The folder is zipped.
This is a Darwin Core Archive• Data files• Column matching file• Data set documentation
Darwin Core Archive:Anatomy
http://www.organisation.org /my_data.zip
Archives on a web server can be accessed by a URL. Share this URL to “publish” your data!
Darwin Core Archive:Publishing
• GBIF Darwin Core Archive Spreadsheet Templates:• data in a spreadsheet already• simple archive authoring
• IPT:• creating/managing archives for multiple data sets• managing archives for multiple organisations• metadata as GBIF Metadata Profile of EML
• Make Your Own:• automating archive generation• customisation
• Hosting center:• economy of scale• Infrastructure and support
• Combinations…
Darwin Core Archive:Publishing Options