IDs in and out of the database Entomological Collections Network (ECN) 2012 November 10 – 11, Knoxville, TN Debbie Paul, Greg Riccardi

Embed Size (px)

Citation preview

  • Slide 1
  • IDs in and out of the database Entomological Collections Network (ECN) 2012 November 10 11, Knoxville, TN Debbie Paul, Greg Riccardi
  • Slide 2
  • What good is identification? How are identifiers used by consumers Providing IDs Resolving IDs in a server Strategies for storing IDs in databases Linked Data Annotations ~ all sorts Feedback Overview
  • Slide 3
  • What good is identification? Aggregation If you get info from 2 sources that are about the same object, you can combine the info Resolution (finding information about object) Types of resolution Determine where to get information Determine how to get information Providing information How to create IDs How to publish IDs How to fetch database information for IDs
  • Slide 4
  • HTTP URIs Biggest problem Identification and 2 types of resolution are comingled Resolution Where to get information Look somewhere How to get information Fetch information using some protocol
  • Slide 5
  • DOI example The DOI is 10.3897/zookeys.209.3135 URI (for aggregating) is doi:10.3897/zookeys.209.3135 A URL for information retrieval (proxy resolution) is http://dx.doi.org/10.3897/zookeys.209.3135 Information fetched from HTML: http://www.pensoft.net/journals/zookeys/article/3 135/abstract/five-task-clusters-that-enable- efficient-and-effective-digitization-of-biological- collections RDF: http://data.crossref.org/10.3897/zookeys.209.3135
  • Slide 6
  • Whats in an ID? For consumer: NOTHING! No information Might as well be UUID Cant type it, remember it, parse it, resolve it Useful for comparison and aggregation Equal strings (persistence) Different strings about the same object fetching information Send the ID somewhere for info
  • Slide 7
  • Whats in an ID? For Provider/resolver: Use ID to find local storage of information E.g. parse out the DWC triple Extract the database table and primary key Look up the ID in a table of IDs Look up ID in a URI field of a database table
  • Slide 8
  • Whats in an id for the provider? record id112234 uuid 954c8760-e1a6-4b4b-ab82-6bf7311c25f3 lsid urn:lsid:example.org:specimen:22545 an http - uri ezid http://n2t.net/ark:/99999/fk42b9hdf doi doi:10.1038/ng0609-637
  • Slide 9
  • What about Specimen identifiers? identifier on the specimen? readable text encoded data barcode is a contextual identifier identifier in the database? http://ids.usms.edu/herb/0014097 http://ids.usms.edu/herb/0303134303937
  • Slide 10
  • How do providers identify? Notice online databases and your database and find the identifiers of the various objects Some identifiers are local (e.g. primary key) Some identifiers are globally unique Some identifiers are URIs
  • Slide 11
  • Identification in the field
  • Slide 12
  • Storing IDs in databases your contextual ids?, your guids? What to use for IDs? record id uuid lsid uri whats in your wallet database? Morphbank Example
  • Slide 13
  • IDs in Morphbank Morphbank Example http://www.morphbank.net/818505
  • Slide 14
  • IDs in Morphbank Morphbank Example http://www.morphbank.net/643261
  • Slide 15
  • Sharing data with IDs into a publication uploaded to the web data shared with a database integrator / aggregator GBIF iDigBio VertNet Morphbank what is it exactly in the publication? an id?, a guid? a link to more information? what will be cited? searched for?
  • Slide 16
  • Feedback with IDs Annotations Target of annotation http://www.morphbank.net/818505 filtered PUSH linked data ~ the semantic web (benefits in a minute) updating the database be(a)ware Remember previous IDs
  • Slide 17
  • Whats coming up next? expect guids for all sorts of objects collection objects (example: specimen) georeferences taxon concepts determinations people
  • Slide 18
  • GUIDs are key 1 to many IDs known for a given object store and share the ones you know about Specimen RecordID 19537 Specimen Previous Catalog Number 212345 Specimen Catalog Number / bar code bbbrc000123 Darwin Core Triplet (DwC) flmnh:herb:bbbrc000123 DwC Occurrence URI urn:catalog:flmnh:herb:bbbrc000123 Specimen GUID of type lsid urn:lsid:biocol.org:flmnh:bbbrc000123 Specimen Opaque Identifier (UUID) 424854d7-baec-42cf-a142-805b64117b9f URI for UUID urn:uuid:424854d7-baec-42cf-a142-805b64117b9f Specimen GUID of type HTTP-URI http://ids.flmnh.ufl.edu/herb/bbbrc000123 *Cannot enforce single identifier per object
  • Slide 19
  • caring for guids store them database adjustments tweaking current standard practices share them data standards 3 ways to modify darwin core reap the benefits
  • Slide 20
  • caring for guids reap the benefits Data quality feedback Dialog based on annotation Tracking objects through analysis and use Maintaining attribution to provider Find related objects Find a way to take advantage of efforts of many smart dedicated people BHL, biscicol, filtered PUSH, GNA, TNRS, SGR,
  • Slide 21
  • Thanks from iDigBio
  • Slide 22
  • uniqueness Uniqueness can be guaranteed by context as in UPC, ISBN, DOI by design: URI based on scheme plus DNS By sparseness as in UUID Uniqueness can be reinforced by encoding As in UPC, make values sparse Cannot enforce single identifier per object
  • Slide 23
  • persistence Persistence refers to the binding of identifier to object Not object availability An unexpected interpretation A persistent identifier is one that can be relied on for its connection to an object. Once assigned to 1 object it will never be assigned to another