31
Workshop on the DOI System DOI SYSTEM: SYNTAX International DOI Foundation

Workshop on the DOI System DOI SYSTEM: SYNTAX International DOI Foundation

Embed Size (px)

Citation preview

Page 1: Workshop on the DOI System DOI SYSTEM: SYNTAX International DOI Foundation

Workshop on the DOI System

DOI SYSTEM: SYNTAX

International DOI Foundation

Page 2: Workshop on the DOI System DOI SYSTEM: SYNTAX International DOI Foundation

• Terminology • Format• Assignment and uniqueness• Scope of the DOI System • Relation to other identifier schemes • Directory management • The uses of prefixes for management • Administrative granularity

Outline / Key concepts in this section doi>

Page 3: Workshop on the DOI System DOI SYSTEM: SYNTAX International DOI Foundation

DOI Handbook Chapter 2, “Numbering” http://www.doi.org/handbook_2000/enumeration.html

Further reading on key concepts in this section doi>

Page 4: Workshop on the DOI System DOI SYSTEM: SYNTAX International DOI Foundation

• DOI name: the string that specifies a unique object (the referent) within the DOI System. Names may consist of alphanumeric characters in a sequence prescribed by the DOI syntax.

• The terms “identifier” and “number” are sometimes but not always used in the same sense and are to be avoided where ambiguity might arise.

• The unqualified use of “DOI” alone may also be ambiguous: the term should instead always be used in conjunction with a specific noun (DOI name, DOI system, etc).

DOI name doi>

Page 5: Workshop on the DOI System DOI SYSTEM: SYNTAX International DOI Foundation

• A DOI name consists of a prefix and a suffixe.g. 10.1223/4567

• DOI names are case insensitive– 10.123/ABC is identical to 10.123/AbC – This is a deliberate choice: see DOI Handbook 2.4

• Prefixes and suffixes use ascii characters (letters and numbers)– in principle can use any printable characters from the Universal Character Set

(UCS-2), of ISO/IEC 10646, which is the character set defined by Unicode v2.0: encompasses most characters used in every major language written today.

– However, because of specific uses made of certain characters by some Internet technologies (vary by browser!), recommended to keep to simple (A-Z, 1-9)

– Note encoding requirements when a DOI name is used with HTML, URLs, and HTTP (special care with % “ # and [space], and use of pointed brackets < > in xml etc)

– http://www.doi.org/handbook_2000/appendix_1.html#A1-E

• Prefixes are allocated to DOI name assigners; assigners then add the suffix. RAs oversee the process to ensure no duplication etc.

DOI syntax doi>

Page 6: Workshop on the DOI System DOI SYSTEM: SYNTAX International DOI Foundation

• Prefix always begins 10 (by convention)– In practice, 10 is the Handle system prefix allocated to the IDF– If it doesn’t begin 10, it’s not a DOI name (but it may be a Handle)

• Prefix may be any length, but currently using four digits. e.g. 10.1234/456-mydoc-456584893489

• Prefix may be further subdivided e.g. 10.1234.456.7/4851234

– Current DOI System practice is not to do so unless a specific requirement – Such subdivisions are peers (10.123 is the same level as 10.123.456), but can

be specifically configured to be a hierarchy

DOI syntax: prefix doi>

Page 7: Workshop on the DOI System DOI SYSTEM: SYNTAX International DOI Foundation

• Suffix may be any length.

• Suffix may incorporate another identifier numbering scheme (or may be new):– e.g. 10.1234/ISBN 0-7894-7764-5– the DOI System treats all DOI names as “dumb strings” – care if the other identifier contains special characters (e.g. the SICI < > )

• If not using another identifier, then the assigner needs to devise some way of allocating numbers.

• Using DOI names may obviate the need adopt or create a new scheme: e.g. in CrossRef:

– Publisher A uses PII: S1384107697000225– Publisher B uses SICI: 0361-9230(1997)42:<OaEoSR>2.0.TX;2-B– Publisher C uses his own numbers: JoesPaper56 These three schemes are not at all interoperable, but become so in the DOI System as:– doi:10.2345/S1384107697000225– doi:10.4567/0361-9230(1997)42:<OaEoSR>2.0.TX;2-B– doi:10.6789/JoesPaper56

• A particular Registration Agency may (and probably should) determine some specific rules or recommendations for its own DOI name registrants and applications.

DOI syntax: suffix doi>

Page 8: Workshop on the DOI System DOI SYSTEM: SYNTAX International DOI Foundation

• When displayed on screen or in print, a DOI name is preceded by a lowercase "doi:" unless the context clearly indicates that a DOI name is implied. – EXAMPLE: the DOI name 10.1006/jmbi.1998.2354 is displayed as

doi:10.1006/jmbi.1998.2354.

• The use of lowercase string “doi” follows the specification for representation as a URI; http://www.ietf.org/rfc/rfc2396.txt (as for e.g. "ftp:" and "http:").

• When displayed in web browsers the DOI name itself may be attached to the address for an appropriate proxy server, to enable resolution of the DOI name via a standard web hyperlink. – EXAMPLE: the DOI name 10.1006/jmbi.1998.2354 could be made an

active link as http://dx.doi.org/10.1006/jmbi.1998.2354.

Visual presentation of DOI name doi>

Page 9: Workshop on the DOI System DOI SYSTEM: SYNTAX International DOI Foundation

• Digital Object Identifier = Digital [Object Identifier] – not [Digital Object] Identifier

• “The DOI® System provides an infrastructure for persistent unique identification of entities ... A DOI name is permanently assigned to an object, to provide a persistent link to current information about that object, including where the object, or information about it, can be found on the internet”.

• Because entities of interest may be physical, digital, or abstract.– e.g. CrossRef assigns DOI name to “article” irrespective of format

• Handle: Digital Object Architecture– Not a conflict: Any entity can be abstracted into a representation as

a digital object

Scope of the DOI System doi>

Page 10: Workshop on the DOI System DOI SYSTEM: SYNTAX International DOI Foundation

• A DOI name may be assigned to any object of any form whenever there is a functional need to distinguish it as a separate entity.

• Registration Agencies may specify more constrained rules for the assignment of DOI names to objects for DOI-related services.

• “The principal focus of assignment shall be to content-related entities exemplified by, but not limited to: text documents; data sets; sound carriers; books; photographs; serials; audio, video and audiovisual recordings; software; abstract works; artwork, etc., and related entities in their management, e.g. licences, parties”.

doi>Scope of the DOI System

Page 11: Workshop on the DOI System DOI SYSTEM: SYNTAX International DOI Foundation

• Each DOI name can specify one and only one referent in the DOI System.

– A role of Registration Agencies is to provide a service to registrants which facilitates this.

– However, the DOI System will not accept duplicate prefix+suffix and makes internal checks for uniqueness at the time of registration.

• A referent may be specified by more than one DOI name, though it’s recommended practice that each referent has only one DOI name.

– Because it may not always be known that a DOI name already exists – Where multiple DOI names are assigned to the same referent, e.g. through

assignment of DOI names by two different registration agencies, the IDF encourages registration agencies to collaborate in provide a unifying record for that referent.

• It is good practice never to reissue any unique identifier that has been once issued in error.

doi>Uniqueness

Page 12: Workshop on the DOI System DOI SYSTEM: SYNTAX International DOI Foundation

• No time limit for the existence of a DOI name shall be assumed in any assignment, service or application.

• A DOI name and its referent are unaffected by changes in the rights associated with the referent, or changes in the management responsibility of the referent object.

• The IDF implements rules for transfer of management responsibility between Registration Agencies, requirements on Registration Agencies for maintenance of records, default resolution services, and technical infrastructure resilience.

• The DOI System is not a means of archival preservation of identified entities.

• The DOI System provides a means to continue interoperability through exchange of meaningful information about identified entities and initiated actions between different systems through at minimum persistence of the DOI name and description of the referent.

doi>Persistence

Page 13: Workshop on the DOI System DOI SYSTEM: SYNTAX International DOI Foundation

PartyPartymakesmakes

Creation

Creation

usesuses

Transaction

Transaction

aboutaboutdodo

View 2: commercedoi>Intellectual property and the DOI System

Current DOI name uses

<indecs>

Page 14: Workshop on the DOI System DOI SYSTEM: SYNTAX International DOI Foundation

• Identifier schemes already exist for many creations – ISBN, ISSN, ISRC, etc. – New ones: e.g. ISTC (textual abstractions e.g. “Robinson Crusoe by Daniel

Defoe”) • ISO standardisation of DOI System recognises this

• First example – “Bookland DOIs” from ISBNs– Name comes from “Bookland” bar codes from ISBNs

• Pilot scheme based on the new syntax of the ISBN-13– ISBN: 978-86-123-4567-8– DOI name to be: 10.978.86123/45678

• Second example - ISSN: • Defined syntax for ISSNs in DOI names:

– doi:10.5555/issnl.1234-5678 (linking ISSN: all media versions)– doi:10.5555/issn.1234-5678 (ISSN: specific media version)

• NB: Relevant information as to the identity of the referent is included in the metadata associated with the DOI name string.

doi>DOI names with existing identifiers

Page 15: Workshop on the DOI System DOI SYSTEM: SYNTAX International DOI Foundation

• General case• ISO standardisation of DOI System

– “A DOI name is not intended as a replacement for other identifier schemes, but when used with them may enhance the identification functionality provided by those systems with additional functionality…”

• Incorporate the other identifier into the DOI syntaxand/or

• Record the other identifier in the DOI metadata.

• Each scheme retains its autonomy but works together

doi>DOI names with existing identifiers

Page 16: Workshop on the DOI System DOI SYSTEM: SYNTAX International DOI Foundation

• Parties– Authors: for disambiguation etc – Institutions: for licensing transactions, etc.– ISNI: International Standard Name Identifier (was: ISPI)

• Based on InterParty “PIDI = Public identity identifier”

– ITU Identity management Focus group • Any end point in the network (machines, users)

• Licences– ONIX for licencing work (with NISO/ERMI)

• Electronic Resource Management Initiative

– Contextual identification

doi>DOI names for entities other than “creations”

Page 17: Workshop on the DOI System DOI SYSTEM: SYNTAX International DOI Foundation

• Granularity: the extent to which a collection of information has been subdivided for purposes of identification (e.g. a collection; a book; tables and figures)– Functional Granularity: it should be possible to identify an entity

whenever it needs to be distinguished

• Your functional granularity may not be my functional granularity: – A wants to distinguish “this book in any format”, but B wants to

distinguish “the pdf version” from “the html version”, etc ….”

• “It is a fundamental of almost any statistic that, to produce it, something, somewhere has been defined and identified. Never underestimate how much nuisance that small practical detail can cause. First, it has to be agreed what to count…. In maths numbers seem hard, pristine and bright, neatly defined around the edges. In life, we do better to think of something murkier and softer”– “The Tiger That Isn’t: Seeing Through a World of Numbers” (2007)

Blastland & Dilnot

• You must know (say) PRECISELY WHAT is being identified

Granularity doi>

Page 18: Workshop on the DOI System DOI SYSTEM: SYNTAX International DOI Foundation

• A DOI name may be assigned to any entity, regardless of the extent to which it may be a component part of some larger entity. DOI names may be assigned at arbitrary levels of granularity or abstraction.

• EXAMPLE: separate DOI names may be assigned to: – a novel as an abstract work; – a specific edition of that novel; – a specific chapter within that edition of the novel; – a single paragraph; – a specific image or quotation; – each specific manifestation in which any of those entities are published or

otherwise made available, – “or any other level of granularity which a registrant deems to be

appropriate”

• Assignment of a DOI name shall require the Registrant to record metadata describing the entity to which the DOI name is being assigned. The metadata shall describe the entity to the degree that is necessary to distinguish it as a separate entity within the DOI System. In certain cases (which shall be defined in the User Manual) it shall be allowable for no metadata declaration to be made.

Granularity doi>

Page 19: Workshop on the DOI System DOI SYSTEM: SYNTAX International DOI Foundation

Manuscriptmss #ABC123

paper journal/volume/page

Specifying what is identified

Two things in one: Physical manifestation of intangible work(which is identified?)

doi>

Page 20: Workshop on the DOI System DOI SYSTEM: SYNTAX International DOI Foundation

MS

Vol/page; ISBN; SICI, etc

Web page URL

“intangible Work”

“intangible Work”

“work” used in analytical sense, not copyright sense

Page 21: Workshop on the DOI System DOI SYSTEM: SYNTAX International DOI Foundation
Page 22: Workshop on the DOI System DOI SYSTEM: SYNTAX International DOI Foundation

Versions – separately identified?

Page 23: Workshop on the DOI System DOI SYSTEM: SYNTAX International DOI Foundation

Document on screen

Abstract work?Manifestation of abstract work?Version?This HTML file? All/some of these?

What are we identifying? doi>

Page 24: Workshop on the DOI System DOI SYSTEM: SYNTAX International DOI Foundation

doi>Does it matter?

Yes, it can do. e.g.:1. Practical use of data. Example – journal article

– For the purpose of citation: • Count pdf, print, html as same• Citation refers to the abstract work (hence ISI,

CrossRef)– For the purpose of purchase:

• Count pdf, print, html as different • Purchase refers to the manifestation

– Suppose I encounter a purchase system and try to use it for counting citations….

– Can I rely on a system now if I don’t know what is being identified? Can others rely on the system long after I’m gone?

2. Legal implications: copyright “My A is the same as your B and is my

copyright…”

Page 25: Workshop on the DOI System DOI SYSTEM: SYNTAX International DOI Foundation

Principles: • Unique Identification: every entity should be uniquely identified

within an identified namespace.• Functional Granularity: it should be possible to identify an entity

whenever it needs to be distinguished • Designated Authority: the author of an item of metadata should

be securely identified.• Appropriate Access: everyone requires access to the metadata

on which they depend, and privacy and confidentiality for their own metadata from those who are not dependent on it.

• Definition of metadata: An item of metadata is a relationship that someone claims to exist between two referents (description)

More on this: see “data model”

The <indecs> framework doi>

Page 26: Workshop on the DOI System DOI SYSTEM: SYNTAX International DOI Foundation

• Many of the items we manage should be treated as “First-class objects”.

• First class = having an identity independent of any other item.– A key concept of Digital object architecture (e.g. Handles)

www.acme.com/document456 Document456

Vanity Fair Penguin Classics: Vanity Fair ISBN-13: 978-0-141-43983-9

First class naming

www.acme.com

www.newco

www.acme.com/doc456 doc456

First class name

doi>

Page 27: Workshop on the DOI System DOI SYSTEM: SYNTAX International DOI Foundation

• A DOI name consists of a prefix and a suffixe.g. 10.1223/4567

• A prefix can have unlimited suffixes • So in theory, only one prefix is needed? • Could a set of DOI names ever need to be managed

differently – e.g. separated across DOI RAs, or different mirror servers, etc?

• CrossRef example: • Prefix allocated to a publisher (imprint), not a journal • Would it be better to have a separate prefix for each journal?

Journals can move publisher.• Easy to manage one prefix on an everyday basis (ISBN, etc)

– Management of a whole customer’s DOI name set by one prefix• But easiest to group DOI names by separate prefixes if you

need to change them…• A trade-off

The use of prefixes doi>

Page 28: Workshop on the DOI System DOI SYSTEM: SYNTAX International DOI Foundation

• Who will need to administer the prefix?• IDF Directory Manager • RA manager • Individual customer of RA (e.g. a publisher)• Individual manger within a publisher (e.g. production

manager)

• Prefixes can have a defined administrator – Similarly, URLs rely on one site administrator

• But also: DOI names can have any level of administrative granularity

• Every single DOI name could have a different manager!• Handle System has various levels of administrator, and keys• A choice which must depend on each application’s

requirements

Administrative granularity doi>

Page 29: Workshop on the DOI System DOI SYSTEM: SYNTAX International DOI Foundation

URL 2 http://a-books.com/….

DLS 9 acme/repository

HS_ADMIN 100 acme.admin/jsmith

XYZ 100111001111012

Handle dataHandle Data type Index

10.123/456 URL 1 http://acme.com/….

Handles resolve to typed data

Rules for data type construction: www.handle.net/overviews/types.html

doi>

Page 30: Workshop on the DOI System DOI SYSTEM: SYNTAX International DOI Foundation

• Terminology • Format• Assignment and uniqueness• Scope of the DOI System • Relation to other identifier schemes • Directory management • The uses of prefixes for management • Administrative granularity

Outline / Key concepts in this section doi>

Page 31: Workshop on the DOI System DOI SYSTEM: SYNTAX International DOI Foundation

Workshop on the DOI System

DOI SYSTEM: SYNTAX

International DOI Foundation