35
Persistent Iden+fiers, Herbariumworkshop at Kongsvold Fjellstue, September 14, 2014. Dag Endresen, NHMUiO, GBIFNorway

2014-09 Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014

Embed Size (px)

DESCRIPTION

Implementation of persistent and globally unique identifiers for specimens held in natural history collections worldwide will open up new opportunities for referring to these physical resources in an interlinked digital context such as the Internet. Here, we will describe the approach for persistent identification of collection specimens developed and implemented at the Natural History Museum in Oslo (NHM-UiO) by the the Norwegian participant node to the Global Biodiversity Information Facility (GBIF-Norway). The Norwegian university museums are invited to use our resolver service at "http://purl.org/gbifnorway/id/" when publishing biodiversity data to GBIF. All occurrence records published through GBIF-Norway, with appropriate PURL-UUID identifiers mapped to the Darwin Core occurrenceID, will automatically be added to our resolver service and kept updated.

Citation preview

Page 1: 2014-09 Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014

Persistent  Iden+fiers,  Herbarium-­‐workshop      at  Kongsvold  Fjellstue,  September  1-­‐4,  2014.    

Dag  Endresen,  NHM-­‐UiO,  GBIF-­‐Norway  

Page 2: 2014-09 Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014

The  purpose  of  iden.fiers                …is  to  name  things,                making  it  possible  to  refer  to  them.  

2  

Page 3: 2014-09 Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014

Name  ambiguity:  George  

Many  things  are  named  George   3  

Page 4: 2014-09 Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014

What  is  an  iden.fier:    “Each  iden3fier  refers  to  one  and  only  one  thing”  (Coyle  2006).    “An  associa(on  between  a  string  and  a  thing”  (Kunze  2003).    “A  stated  associa(on  between  a  symbol  and  a  thing;  that  the  symbol  may  be  used  to  unambiguously  refer  to  the  thing  within  a  given  context”  (Campbell  2007).  

4  

Page 5: 2014-09 Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014

5  

Page 6: 2014-09 Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014

When  is  the  iden.fier  “good  enough”?    Unique  and  persistent  -­‐  within  a  given  context.    “The  common  experience  is  that  an  iden3fier  is  created  within  a  system  or  within  a  context,  and  that  at  a  later  date  it  needs  to  be  used  in  another  or  larger  context”  (Coyle  2006).    

Expanding  context:  •  Within  one  museum  collec+on  (catalog  number).  •  Within  a  network  between  museum  collec+ons  (collec+on  

code  +  catalogue  number).  •  Within  biodiversity  informa.on  network  (ins+tu+on  code  +  

collec+on/dataset  code  +  catalogue  number).  •  At  the  Internet  (e.g.  hbp  URI,  DOI,  LSID,  etc…)  •  …  larger  contexts  are  possible  to  imagine  in  the  future!!  

6  

Page 7: 2014-09 Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014

Expanding  context  

7  

Page 8: 2014-09 Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014

Iden+fy  the  thing  that  you  care  about  

•  The  specimen  itself  (the  physical  en+ty)  •  Image  of  the  specimen  •  Descrip+on  of  the  specimen  •  Loca+on  where  the  specimen  was  captured  •  The  occurrence  event  when  the  specimen  was  captured  

•  …  8  

Page 9: 2014-09 Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014

Record-­‐level  Terms  dcterms:type  |  dcterms:modified  |  dcterms:language  |  dcterms:rights  |  dcterms:rightsHolder  |  dcterms:accessRights  |  dcterms:bibliographicCita+on  |  dcterms:references  |  ins.tu.onID  |  collec.onID  |  datasetID  |  ins.tu.onCode  |  collec.onCode  |  datasetName  |  ownerIns+tu+onCode  |  basisOfRecord  |  informa+onWithheld  |  dataGeneraliza+ons  |  dynamicProper+es    Occurrence  occurrenceID  |  catalogNumber  |  occurrenceRemarks  |  recordNumber  |  recordedBy  |  individualID  |  individualCount  |  sex  |  lifeStage  |  reproduc+veCondi+on  |  behavior  |  establishmentMeans  |  occurrenceStatus  |  prepara+ons  |  disposi+on  |  otherCatalogNumbers  |  previousIden+fica+ons  |  associatedMedia  |  associatedReferences  |  associatedOccurrences  |  associatedSequences  |  associatedTaxa    MaterialSample  materialSampleID    Event  eventID  |  samplingProtocol  |  samplingEffort  |  eventDate  |  eventTime  |  startDayOfYear  |  endDayOfYear  |  year  |  month  |  day  |  verba+mEventDate  |  habitat  |  fieldNumber  |  fieldNotes  |  eventRemarks    dcterms:Loca.on  loca.onID  |  higherGeographyID  |  higherGeography  |  con+nent  |  waterBody  |  islandGroup  |  island  |  country  |  countryCode  |  stateProvince  |  county  |  municipality  |  locality  |  verba+mLocality  |  verba+mEleva+on  |  minimumEleva+onInMeters  |  maximumEleva+onInMeters  |  verba+mDepth  |  minimumDepthInMeters  |  maximumDepthInMeters  |  minimumDistanceAboveSurfaceInMeters  |  maximumDistanceAboveSurfaceInMeters  |  loca+onAccordingTo  |  loca+onRemarks  |  verba+mCoordinates  |  verba+mLa+tude  |  verba+mLongitude  |  verba+mCoordinateSystem  |  verba+mSRS  |  decimalLa+tude  |  decimalLongitude  |  geode+cDatum  |  coordinateUncertaintyInMeters  |  coordinatePrecision  |  pointRadiusSpa+alFit  |  footprintWKT  |  footprintSRS  |  footprintSpa+alFit  |  georeferencedBy  |  georeferencedDate  |  georeferenceProtocol  |  georeferenceSources  |  georeferenceVerifica+onStatus  |  georeferenceRemarks    GeologicalContext  geologicalContextID  |  earliestEonOrLowestEonothem  |  latestEonOrHighestEonothem  |  earliestEraOrLowestErathem  |  latestEraOrHighestErathem  |  earliestPeriodOrLowestSystem  |  latestPeriodOrHighestSystem  |  earliestEpochOrLowestSeries  |  latestEpochOrHighestSeries  |  earliestAgeOrLowestStage  |  latestAgeOrHighestStage  |  lowestBiostra+graphicZone  |  highestBiostra+graphicZone  |  lithostra+graphicTerms  |  group  |  forma+on  |  member  |  bed    Iden.fica.on  iden.fica.onID  |  iden+fiedBy  |  dateIden+fied  |  iden+fica+onReferences  |  iden+fica+onVerifica+onStatus  |  iden+fica+onRemarks  |  iden+fica+onQualifier  |  typeStatus    Taxon  taxonID  |  scien.ficNameID  |  acceptedNameUsageID  |  parentNameUsageID  |  originalNameUsageID  |  nameAccordingToID  |  namePublishedInID  |  taxonConceptID  |  scien+ficName  |  acceptedNameUsage  |  parentNameUsage  |  originalNameUsage  |  nameAccordingTo  |  namePublishedIn  |  namePublishedInYear  |  higherClassifica+on  |  kingdom  |  phylum  |  class  |  order  |  family  |  genus  |  subgenus  |  specificEpithet  |  infraspecificEpithet  |  taxonRank  |  verba+mTaxonRank  |  scien+ficNameAuthorship  |  vernacularName  |  nomenclaturalCode  |  taxonomicStatus  |  nomenclaturalStatus  |  taxonRemarks    ResourceRela.onship  (Auxiliary  Terms)  resourceRela.onshipID  |  resourceID  |  relatedResourceID  |  rela+onshipOfResource  |  rela+onshipAccordingTo  |  rela+onshipEstablishedDate  |  rela+onshipRemarks    MeasurementOrFact  (Auxiliary  Terms)  measurementID  |  measurementType  |  measurementValue  |  measurementAccuracy  |  measurementUnit  |  measurementDeterminedDate  |  measurementDeterminedBy  |  measurementMethod  |  measurementRemarks   9  

Page 10: 2014-09 Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014

Term  name:   occurrenceID  Iden+fier:   hbp://rs.tdwg.org/dwc/terms/occurrenceID  Class:   hbp://rs.tdwg.org/dwc/terms/Occurrence  Defini+on:   An  iden+fier  for  the  Occurrence  (as  opposed  to  a  par+cular  digital  

record  of  the  occurrence).  In  the  absence  of  a  persistent  global  unique  iden.fier,  construct  one  from  a  combina+on  of  iden+fiers  in  the  record  that  will  most  closely  make  the  occurrenceID  globally  unique.  

Comment:   For  a  specimen  in  the  absence  of  a  bona  fide  global  unique  iden+fier,  for  example,  use  the  form:  "urn:catalog:[ins.tu.onCode]:[collec.onCode]:[catalogNumber]".      Examples:  "urn:lsid:nhm.ku.edu:Herps:32",  "urn:catalog:FMNH:Mammal:145732".      For  discussion  see  hbp://code.google.com/p/darwincore/wiki/Occurrence  

10  

Page 11: 2014-09 Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014

Iden.fiers  for  museum  collec.ons    The  longevity  of  museums  lead  to:    “The  need  to  use  iden(fiers  from  our  past  in  the  current  highly-­‐networked  digital  systems”  (Coyle  2006  [talking  about  libraries]).    Specify  a  namespace  for  the  iden+fiers?  •  URI  –  uniform  resource  iden+fier  (unique  in  the  context  of  the  web).  •  URN  –  uniform  resource  name  (name  not  +ed  to  loca+on).  •  URL  –  uniform  resource  locator  (network  loca+on  as  iden+fier).  •  PURL  –  persistent  URL  (commitment  to  service  longevity).  

 Something  else…?  •  DOI  –  digital  object  iden+fier  •  ARK  –  archival  resource  key  •  UUID  –  universal  unique  iden+fier  

11  

Page 12: 2014-09 Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014

•  Persistent  Iden+fier  (PID)  •  Globally  Unique  Iden+fier  (GUID)  •  Universal  Resource  Iden+fier  (URI)  •  Persistent  Uniform  Resource  Locator  (PURL)  •  Life  Science  Iden+fier  (LSID)  •  Digital  Object  Iden+fier  (DOI)  •  Handle  system  (Handle)  •  Archival  Resource  Key  (ARK,  EZID)  •  Universally  Unique  Iden+fier  (UUID)  •  …  

12  

Page 13: 2014-09 Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014

Photo:  Smithsonian  Na+onal  Museum  of  Natural  History,  USNM-­‐445024-­‐Eutoxeres-­‐aquila  

PURL  

Reuse  exis(ng  iden(fiers  

13  

Page 14: 2014-09 Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014

hbp://purl.

org/nhmuio/id

/41d9cbb4-­‐

4590-­‐4265-­‐

8079-­‐ca44d

46d27c3  

Illustra+on  by  Miroslav  Šašek  (1963)  

Reuse  iden(fiers  

14  

Page 15: 2014-09 Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014

•  Globally  unique    •  Scalability,  number  of  IDs  •  Community  acceptance  •  Long-­‐term  life-­‐cycle  •  Resolvable,  resolu+on  service(s)  •  Cost  per  iden+fier  •  People-­‐friendly  or  machine-­‐friendly  •  Solu+on  for  the  genera+on  of  new  IDs  

–  Central  genera+on,  PID  issuer    – Distributed  genera.on  at  source  

15  

Page 16: 2014-09 Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014

•  A  UUID  is  a  16-­‐octet  (128-­‐bit)  36-­‐chars  number.  •  Example:  C37E3F9B-­‐BCAF-­‐4479-­‐8EB7-­‐3346A2DB2373  •  The  probability  of  one  duplicate  would  be  about  50%  if  every  person  on  earth  create  600  million  UUIDs.  

•  Allows  for  easy  genera.on  at  source  in  a  distributed  network.  

16  

Page 17: 2014-09 Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014

Iden+fier   Resolver  

Loca+on  Specimen  

The  resolver  is  a  system  to  resolve  loca+ons  from  iden+fiers,  enabling  retrieval  even  when  the  loca+on  changes.  

17  

Page 18: 2014-09 Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014

PURL  technology  provides  a  robust  resolu+on  service  ready  for  the  future  -­‐  and  a  stable  solu+on  that  is  working  well  right  now.    PURL  for  the  NHM-­‐resolver:  

 hbp://purl.org/nhmuio/id/[PID]      

The  NHM-­‐PURL  redirects  here:    hbp://gbif.no/resolver/[PID]    

 Could  with  few  modifica+ons  redirect  e.g.  here:  

 hCp://gbif.org/resolver/[PID]    18  

Page 19: 2014-09 Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014

hbp  –  PURL  –  UUID    hbp://purl.org/nhmuio/id/41d9cbb4-­‐4590-­‐4265-­‐8079-­‐ca44d46d27c3  

19  

Page 20: 2014-09 Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014

hbp://purl.org/nhmuio/id/UUID        à        hbp://gbif.no/resolver/UUID      hbp://purl.org/gbifnorway/id/UUID        à        hbp://gbif.no/resolver/UUID      

20  

Page 21: 2014-09 Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014

Including  machine  readable  formats  

21  

Page 22: 2014-09 Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014

Catalog  number:  O-­‐L-­‐000014                  hbp://purl.org/nhmuio/id/41d9cbb4-­‐4590-­‐4265-­‐8079-­‐ca44d46d27c3  

22  

Page 23: 2014-09 Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014

Machine  readable  labels   23  

Page 24: 2014-09 Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014

•  Quick  Response  Code  (QR  code).  •  A  type  of  matrix  barcode  (or  two-­‐dimensional  code).  

•  Popular  due  to  its  fast  readability  and  large  storage  capacity.  

•  The  use  of  QR  Codes  is  free  of  any  license.  •  The  QR  Code  is  clearly  defined  and  published  as  an  ISO  standard.  

•  Invented  in  Japan  by  the  Toyota  subsidiary  Denso  Wave  in  1994.  

24  

Page 25: 2014-09 Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014

hbp://purl.org/nhmuio/id/d91e8253-­‐0ac1-­‐4681-­‐ac69-­‐e50070af86a2  

25  

Page 26: 2014-09 Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014

UUID  QR  codes  for  museum  objects  at  NHM-­‐UiO  provides:    •  Machine-­‐readable  iden.fiers  

(using  a  simple  smart  phone  -­‐  or  a  barcode  reader)  

•  Allows  for  new  and  efficient  workflows  for  collec+on  management.  

•  Deployment  for  stable  iden.fiers  appropriate  for  data-­‐basing.  

26  

Page 27: 2014-09 Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014

Efficient  workflow  rou+nes  

27  

Page 28: 2014-09 Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014

hbp://gbif.no

/dugnad/  

28  

Page 29: 2014-09 Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014

•  Peer  review  op+on  for  biodiversity  data  sets.  •  Authors  get  scien+fic  credit  for  data  publica+on.  •  Mee+ng  concerns  over  data  quality.  •  Mee+ng  concerns  over  data  cita.on  

mechanism.  

•  Towards  à  Each  data  set  published  through  GBIF  accompanied  by  a  data  paper…?  

29  

Page 30: 2014-09 Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014

Why  publish  your  data    

•  Citable  publica+on  •  Establish  scien+fic  priority  •  Increase  collabora+on  •  Link  data  to  bigger  network  •  Re-­‐use  and  mul+ply  effect  •  Respond  to  funding  requirements  

hbp://biodiversitydatajournal.com/    

Smith  V,  Georgiev  T,  Stoev  P,  Biserkov  J,  Miller  J,  Livermore  L,  Baker  E,  Mietchen  D,  Couvreur  T,  Mueller  G,  Dikow  T,  Helgen  K,  Frank  J,  Agos+  D,  Roberts  D,  Penev  L  (2013)  Beyond  dead  trees:  integra+ng  the  scien+fic  process  in  the  Biodiversity  Data  Journal.  Biodiversity  Data  Journal  1:  e995.  DOI:  10.3897/BDJ.1.e995   30  

Page 31: 2014-09 Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014

Globally  unique  iden+fiers  are  one  of  the  three  core  components  in  the  TDWG  technical  architecture.  

31  

Page 32: 2014-09 Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014

Status    27.  August  2014  

GBIF  enables  free  and  open  access  to  biodiversity  data  online.      We  are  an  interna+onal  government-­‐ini+ated  and  funded  ini+a+ve  focused  on  making  biodiversity  data  available  to  all  and  anyone,  for  scien+fic  research,  conserva+on  and  sustainable  development.  

32  

Page 33: 2014-09 Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014

GBIF  provides  a  data  discovery  system  

global  registry   data  portal  

that  is  dependent  on  resolvable  stable  iden3fiers  for  efficient  func3onality  

33  

Page 34: 2014-09 Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014

               Dag  Endresen  [email protected]          Herbarium-­‐workshop  at  Kongsvold  {ellstue,  September  1  to  4,  2014      

Gary Larson, 1987  

34  

Page 35: 2014-09 Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014

35  

Slide  1:  Image  source:  TU  GRAZ,  Austria,  hbp://campusonline.tugraz.at/organisa+on/campusonline.  Fair  use  ra+onale:  The  image  is  used  to  illustrate  the  principle  of  stable  and  persistent  iden+fiers  forming  the  glue  to  connect  data  objects.    Slide  3:  George:  George  Orwell,  George  Harrison,  George  Bush,  George  Bush  jr,  George  Soros,  George  Washington,  Boy  George,  George  (Seinfeld),  George  Lucas,  George  Clooney,  Prince  George  of  Cambridge,  King  George  III  of  England,  George  Armstrong  Custer,  Georges  Enescu,  Curious  George,  St  George  in  New  Brunswick,  George  Coleman,  George  Eliot.  Fair  use  ra+onale:  Images  of  people  and  places  named  George  from  an  Internet  search.  These  images  are  used  here  to  illustrate  the  weakness  of  using  a  human-­‐friendly  iden+fier/name,  and  that  in  the  global  society  context,  many  people  and  places  are  named  George,  leading  to  a  name  ambiguity  problem.  We  will  not  know  which  George  it  is  referred  to.      Slide  5:  Photo:  Sancya/AP./  Published:  03/31/2009  3:58:00,  hbp://www.nydailynews.com/news/money/pile-­‐unsold-­‐cars-­‐graveyards-­‐gallery-­‐1.45144  Fair  use  ra+onale:  The  image  is  used  to  illustrate  the  principle  of  uniqueness  of  iden+fiers  within  a  given  context  -­‐  such  as  here  car  license  number  plates.  The  car  license  number  is  unlikely  to  be  globally  unique  in  a  larger  context  such  as  e.g.  the  Internet.    Slide  6:  Illustra+on  retrieved  from  hbp://www.hypnosisinmelbourne.com.au/index.php?p=49.  Fair  use  ra+onale:  The  image  is  used  to  illustrate  the  principle  of  expanding  context  that  stable  iden+fiers  can  be  subject  to.  An  iden+fier  used  in  a  par+cular  context,  such  as  the  Internet,  could  be  exposed  to  a  larger  context  at  a  later  future  +me.    Slide  7:  Fair  use  ra+onale:  The  image  is  of  unknown  source,  retrieved  from  an  Internet  search.  The  image  is  used  to  illustrate  the  principle  of  expanding  context  that  stable  iden+fiers  can  be  subject  to.  An  iden+fier  used  in  a  par+cular  context,  such  as  the  Internet,  could  be  exposed  to  a  larger  context  at  a  later  future  +me.    Slide  14:  Image:  This  is  Cape  Canaveral  (M.  Sasek,  1963),  hbp://blog.miroslavsasek.com/wp-­‐content/uploads/2009/05/moon-­‐birdwatchers-­‐400.jpg  by  Miroslav  Šašek(1916-­‐1980),  hbp://www.miroslavsasek.com/,  hbp://www.ilike.org.uk/2009/05/this_is_m_sasek.html.  Fair  use  ra+onale:  The  image  is  used  here  to  illustrate  the  principle  of  aiming  at  naming  an  observed  organism  re-­‐using  common  exis+ng  persistent  iden+fiers.    Slide  23:  Photo:  J.Schulzki.  Fair  use  ra+onale:  The  image  is  used  to  illustrate  the  principle  of  machine-­‐readable  labels.  The  handling  of  luggage  n  an  airport  context  (or  the  handling  of  parcels  and  lebers  in  a  postal  service  context)  could  serve  as  an  inspira+on  for  developing  robo+zed  handling  of  museum  specimens  -­‐  if  these  specimens  are  given  machine-­‐readable  labels.    Slide  34:  Image:  Gary  Larson,  The  Far  Side  Observer,  October  1987,  hbp://i227.photobucket.com/albums/dd202/tomcat600/gary-­‐larson-­‐oct-­‐1987.gif.  Fair  use  ra+onale:  This  drawing  is  assumed  to  be  copyrighted  by  Gary  Larson  and  used  here  under  a  fair  use  claim.  The  image  is  used  to  illustrate  the  principle  of  naming  all  things  using  persistent  iden+fiers.    The  images  are  used  in  an  educa+onal  and  not-­‐for-­‐profit,  non-­‐commercial  purpose.