IDs in and out of the database Entomological Collections
Network (ECN) 2012 November 10 11, Knoxville, TN Debbie Paul, Greg
Riccardi
Slide 2
What good is identification? How are identifiers used by
consumers Providing IDs Resolving IDs in a server Strategies for
storing IDs in databases Linked Data Annotations ~ all sorts
Feedback Overview
Slide 3
What good is identification? Aggregation If you get info from 2
sources that are about the same object, you can combine the info
Resolution (finding information about object) Types of resolution
Determine where to get information Determine how to get information
Providing information How to create IDs How to publish IDs How to
fetch database information for IDs
Slide 4
HTTP URIs Biggest problem Identification and 2 types of
resolution are comingled Resolution Where to get information Look
somewhere How to get information Fetch information using some
protocol
Slide 5
DOI example The DOI is 10.3897/zookeys.209.3135 URI (for
aggregating) is doi:10.3897/zookeys.209.3135 A URL for information
retrieval (proxy resolution) is
http://dx.doi.org/10.3897/zookeys.209.3135 Information fetched from
HTML: http://www.pensoft.net/journals/zookeys/article/3
135/abstract/five-task-clusters-that-enable-
efficient-and-effective-digitization-of-biological- collections
RDF: http://data.crossref.org/10.3897/zookeys.209.3135
Slide 6
Whats in an ID? For consumer: NOTHING! No information Might as
well be UUID Cant type it, remember it, parse it, resolve it Useful
for comparison and aggregation Equal strings (persistence)
Different strings about the same object fetching information Send
the ID somewhere for info
Slide 7
Whats in an ID? For Provider/resolver: Use ID to find local
storage of information E.g. parse out the DWC triple Extract the
database table and primary key Look up the ID in a table of IDs
Look up ID in a URI field of a database table
Slide 8
Whats in an id for the provider? record id112234 uuid
954c8760-e1a6-4b4b-ab82-6bf7311c25f3 lsid
urn:lsid:example.org:specimen:22545 an http - uri ezid
http://n2t.net/ark:/99999/fk42b9hdf doi doi:10.1038/ng0609-637
Slide 9
What about Specimen identifiers? identifier on the specimen?
readable text encoded data barcode is a contextual identifier
identifier in the database? http://ids.usms.edu/herb/0014097
http://ids.usms.edu/herb/0303134303937
Slide 10
How do providers identify? Notice online databases and your
database and find the identifiers of the various objects Some
identifiers are local (e.g. primary key) Some identifiers are
globally unique Some identifiers are URIs
Slide 11
Identification in the field
Slide 12
Storing IDs in databases your contextual ids?, your guids? What
to use for IDs? record id uuid lsid uri whats in your wallet
database? Morphbank Example
Slide 13
IDs in Morphbank Morphbank Example
http://www.morphbank.net/818505
Slide 14
IDs in Morphbank Morphbank Example
http://www.morphbank.net/643261
Slide 15
Sharing data with IDs into a publication uploaded to the web
data shared with a database integrator / aggregator GBIF iDigBio
VertNet Morphbank what is it exactly in the publication? an id?, a
guid? a link to more information? what will be cited? searched
for?
Slide 16
Feedback with IDs Annotations Target of annotation
http://www.morphbank.net/818505 filtered PUSH linked data ~ the
semantic web (benefits in a minute) updating the database be(a)ware
Remember previous IDs
Slide 17
Whats coming up next? expect guids for all sorts of objects
collection objects (example: specimen) georeferences taxon concepts
determinations people
Slide 18
GUIDs are key 1 to many IDs known for a given object store and
share the ones you know about Specimen RecordID 19537 Specimen
Previous Catalog Number 212345 Specimen Catalog Number / bar code
bbbrc000123 Darwin Core Triplet (DwC) flmnh:herb:bbbrc000123 DwC
Occurrence URI urn:catalog:flmnh:herb:bbbrc000123 Specimen GUID of
type lsid urn:lsid:biocol.org:flmnh:bbbrc000123 Specimen Opaque
Identifier (UUID) 424854d7-baec-42cf-a142-805b64117b9f URI for UUID
urn:uuid:424854d7-baec-42cf-a142-805b64117b9f Specimen GUID of type
HTTP-URI http://ids.flmnh.ufl.edu/herb/bbbrc000123 *Cannot enforce
single identifier per object
Slide 19
caring for guids store them database adjustments tweaking
current standard practices share them data standards 3 ways to
modify darwin core reap the benefits
Slide 20
caring for guids reap the benefits Data quality feedback Dialog
based on annotation Tracking objects through analysis and use
Maintaining attribution to provider Find related objects Find a way
to take advantage of efforts of many smart dedicated people BHL,
biscicol, filtered PUSH, GNA, TNRS, SGR,
Slide 21
Thanks from iDigBio
Slide 22
uniqueness Uniqueness can be guaranteed by context as in UPC,
ISBN, DOI by design: URI based on scheme plus DNS By sparseness as
in UUID Uniqueness can be reinforced by encoding As in UPC, make
values sparse Cannot enforce single identifier per object
Slide 23
persistence Persistence refers to the binding of identifier to
object Not object availability An unexpected interpretation A
persistent identifier is one that can be relied on for its
connection to an object. Once assigned to 1 object it will never be
assigned to another