Digital Object Identifier doi> Norman Paskin, International DOI Foundation

Preview:

Citation preview

Digital Object Identifier

doi>Norman Paskin, International DOI Foundation

Digital Object Identifier

What is DOI?

• A unique identifier for "a piece of content“ on digital networks• Digital object interoperability

doi>

Description by structured

metadata

Resolution by Handle

Numbering scheme

Policies

doi>

• A unique identifier for "a piece of content" in the physical world • single, common system: UPC/EAN Bar Code • components: code writers, readers, policies, etc. • many uses : once assigned, usable by anyone in chain• wide community support made it work• self-sustaining cost recovery model etc.• standard – helps to integrate systems efficiently

Analogy: the physical bar code doi>

"The DOI is the UPC (Bar Code) for objects of intellectual property on the Internet.”

• 1. Uniquely identifies “content” – enables management of transactions of all kinds

• 2. Provides a stable, persistent link – to the content itself or to services

• 3. Can be used to articulate services as real world applications – using metadata, multiple resolution, rules, etc.

What is the DOI? doi>doi>

• Show DOI as combination of components– use existing standards including Handle

• Show examples of services (applications) built on DOI– Examples here web–based

– but DOI applicable to all platforms

This presentation doi>doi>

DOI syntax can include any

existing identifier, formal or informal,

of any entity

• An identifier “container” e.g.• 10.1234/5678• 10.2341/0-7645-4889-1• 10.5678/978-0-7645-4889-4• 10.1000/ISBN 0764548891• 10.1234/Norman_presentation• 10.2224/2003-1-29-CENDI-DOI• etc

Descriptionby structured

metadata

Resolutionby Handle

Numbering scheme

Policies

doi>

Handle resolution allows a DOI to link to

any & multiple piecesof current data

• Resolve from DOI to: – Location (URL) – persistence

• Resolve to multiple data:– Multiple locations– Metadata– Services– Nested DOIs (related objects etc)– Extensible: new types

Descriptionby structured

metadata

Resolutionby Handle

Numbering scheme

Policies

doi>

<indecs> framework:DOI can describe

any form of intellectual property,

at any level of granularity

• Metadata• For interoperability • Kernel metadata

– A standard, interoperable, small set of data

• Able to use existing metadata – Mapped using standard dictionary

• Providing a standard way of accessing and using the object – “Hooks” to Open URL, UDDI, etc– DOI Applications, Services

Descriptionby structured

metadata

Resolutionby Handle

Numbering scheme

Policies

doi>

DOI policies allow any

business model for practical

implementations

• Common rules of the road (IDF)– Governance and agreed scope, policy, rules

• Cost-recovery (self-sustaining)• Registration agencies (cf ISBN, Visa)• Each can develop own applications, services, sector rules,

business model, fees, metadata etc – DOI at cost– DOI free – DOI with other services – etc

Descriptionby structured

metadata

Resolutionby Handle

Numbering scheme

Policies

doi>

<indecs> framework:DOI can describe

any form of intellectual property,

at any level of granularity

Handle resolution allows a DOI to link to

any & multiple piecesof current data

DOI syntax can include any

existing identifier, formal or informal,

of any entity

DOI policies allow any

business model for practical

implementations

extensible

• The combination of components is unique • Aim to use existing standards or, if not available, to develop standards with others

• Numbering: standard principles

(Naming authorities, delegated responsibility, uniqueness, non-intelligent numbering, etc)

• Resolution: DOI is a Handle implementation(Initially single, now multiple resolution; close collaboration with CNRI as technology partner)

• Metadata: indecs framework (Initially <indecs> consortium, now ISO MPEG)

• Policies: based on similar business models(UPC, ISBN, Visa, etc.)

doi>DOI components Description

by structured metadata

Resolutionby Handle

Numbering scheme

Policies

doi>

ActivitytrackingActivitytracking

Full implementation

Full implementation

Initial implementation

Initial implementation

Single redirection (persistent identifier)

Metadata W3C, WIPO, NISO, ISO, MPEG etc.Multiple resolution

A continuing development activity

DOI: development path doi>

• Resolution provides persistence• Easily seen in web applications - DOI

never changes, but URL does:

Persistent identifier doi>doi>

Handle resolution allows a DOI to link to

any & multiple piecesof current data

doi>

Descriptionby structured

metadata

Resolutionby Handle

Numbering scheme

Policies

doi>

Content

URLURL

URL

URL

URL

URL

URL

URL

URL

URL

URL

URL

URL

URL

doi>

Printed identifiers, bookmarks, etc

doi>doi>

404 File not found

Content

URLURL

URL

URL

URL

URL

URL

URL

URL

URL

URL

URL

URL

URL

doi>

"Linkrot": recent estimates 16% in 6 months

doi>doi>

DOIdirectory

URLURL

URL

URL

URL

URL

URL

URL

URL

URL

URL

URL

URL

URL

Content

Content

Assigner

DOIdirectory

DOIdirectory

DOIDOI

DOI

DOI

DOI

DOI

DOIDOI

DOI

DOI

DOI

DOI

DOI

DOI

doi>doi>doi>

Content

DOIDOI

DOI

DOI

DOI

DOI

DOIDOI

DOI

DOI

DOI

DOI

DOI

DOI

DOIdirectory

DOIdirectory

DOIdirectory

DOIdirectory

DOIdirectory

Assigner

DOIdirectoryDOI

directory

DOIdirectoryDOI

directory

Internet

doi>doi>doi>

Assigner

Content

DOIdirectory

DOIDOI

DOI

DOI

DOI

DOI

DOI

DOIDOI

DOI

DOI

DOI

DOI

DOI

DOI

Response Page

•purchase content•view free excerpt•get related items•get add’l metadata•request permissions

Assigner

doi>

More than just "locate"

doi>doi>

Bookstore

Response Page

•purchase content•view free excerpt•get related items•get add’l metadata•request permissions

Assigner

DOIdirectory

•purchase content

DOIDOI

DOI

DOI

DOI

DOI

DOI

DOIDOI

DOI

DOI

DOI

DOI

DOI

DOI

doi>doi>doi>

The “metadata” component doi>doi>

• “Interoperability of data in e-commerce systems”• <indecs> – a multi-partner effort: see

www.indecs.org • Became adopted and now basis of ISO MPEG-21

Dictionary approach: see the paper “Towards a Rights Data Dictionary”

• Unique Identification• Functional Granularity• Designated Authority• Appropriate Access• Metadata as “a relationship between two

entities”

doi>Description

by structured metadata

Resolutionby Handle

Numbering scheme

Policies

doi>

The “metadata” component doi>doi>

• Precision – a consistent extensible framework, for automation

• Terminology – defined scope “content” more precisely– In relation to “Digital Objects”, W3C “resources”, WIPO

“works”:– description by precise attributes, ontology

• Ability to interoperate with any existing metadata – SCORM, MARC, ONIX, etc

• Link to standards work like MPEG, XML

• A way of defining “Application Profiles” – sets of metadata plus rules, a way of grouping DOIs – the basis of applications beyond simple persistence – documentation now being completed

doi>Why this has been important to DOI

<indecs> framework:in DOI can describe

any form of intellectual property,

at any level of granularity

• Text objects (ONIX)• Art objects (CIDOC)• Learning objects (SCORM)• Audio objects (GRID)• Video objects (SMPTE)• etc

Metadata efficiency doi>doi>

<indecs> framework:DOI can describe

any form of intellectual property,

at any level of granularity

• Text objects (ONIX)• Art objects (CIDOC)• Learning objects (SCORM)• Audio objects (GRID)• Video objects (SMPTE) etc

Metadata efficiency doi>

• Common single mapping

doi>

Adding value: services doi>doi>

• Acrobat plug-in as focus example here (web based)

• Four example demonstrations shown here:– Version (provide a dynamic update version of the pdf in

hand)– Multiple resolution (retrieve multiple data: a URL and some

metadata in this case)– CrossRef (retrieve a standard set of metadata and use it in

an application, a citation builder)– Rights (very simple e-commerce interface as an

illustration)

doi>

PDF

Tool Bar

Plug-In [ cache ]

doi:10.123/456

Acrobat Reader

Some Service

Another Service

DOI is not visible - within pdf package (like File/Properties in Word, etc)

Buttons "pop up" dynamically as services become available

doi>

Adobe plug-in concept: what

PDF document viewed through Acrobat reader

doi>doi>

Tool Bar

Demo 1 – “get latest version”

Tool Bar

cnri.test.jsn/pdf

TYPE DATA

http://host-4-211/book-newversion.pdfurl

last_modified 2002-06-13T14:06:03-03:00

DOI

Handle Record

2002-06-13T14:06:03-03:00

http://host-4-211/book-newversion.pdf

Internet

Handle System

Demo 1 – “get latest version”

Tool Bar

Demo 1 – “get latest version”

Demo 2 – Multiple Resolution

Demo 2 – Multiple Resolution

Related linksdoi>

Demo 2 – Multiple Resolution

Demo 2 – Multiple Resolution

Tool Bar

Demo 3 –Citation

Tool Bar

Demo 3 –Citation

Tool Bar

Demo 3 –Citation

Tool Bar

Demo 3 –Citation

Tool Bar

Demo 4 – Permissions

Tool BarXMP

Rights button!

Demo 4 – Permissions

Tool Bar

Demo 4 – Permissions

Tool Bar

Demo 4 – Permissions

• Put the DOI data in functional units in the DOI record [Handle]; and the knowledge of what to do with them in the client– Demonstrated with an end-user client (Acrobat) but equally

applicable to middleware– No constraints on adding additional functional units to a given DOI– A common approach – could use same Handle record to manage pdf,

html, mobile, etc., hence efficient in deploying content across platforms

– The resolution to returned metadata through Application Profiles allows complex applications

• Provided a complete packaged solution: numbering, resolution, metadata, policies – On which individual applications and services can be built– The same additional components could be of interest to other Handle

applications: metadata, policies– Avoid reinvention of the wheel

doi>What we have done doi>

10.AP/2 Desc Some description

DOI_Service 10.Service/Metadata; Schema23; http://...

DOI_Service 10.Service/Latest

Created andmaintained byContent Providers

AP (Service Aggregation)DOI to be defined andmaintained byRegistration Agencies

Service DescriptionDOIs to be defined byservice providers.

10.123/456 URL http://www....

DOI_AP 10.AP/2

DOI_ATR 10.ATR/Latest; 22/10/2002

DOI_AP 10.AP/1; KMD; RA; URL

10.Service/Latest Desc Some description

IDL IDL description

Java Java Interface

WSDL Soap Binding

IOR IOR:0001100...

10.Service/Metadata Desc Some description

IDL IDL description

Java Java Interface

WSDL Soap Binding

IOR IOR:0001100...

Handles in DOI doi>

• Several hundred organisations• Several million DOIs

• Examples: • CrossRef • Content Directions Inc

• TSO The Stationery office + others (Europe, US, Asia)

Who is using it now? doi>doi>doi>

• International DOI Foundation (IDF) • Open member organisation, launched 1998• Members; publishing, technology, intermediaries• Modelled on W3C, and on the Bar code

development • www.doi.org

Who has done this? doi>doi>doi>

• Web site at http://www.doi.org• DOI Handbook [http://www.doi.org/hb.html]• DOI news [e mail sign up on site]• DOI FAQs [http://www.doi.org/faq.html]• Metadata:

– Indecs framework [http://www.indecs.org]– “Towards a Rights data Dictionary”

[http://www.doi.org/topics/020522IMI.pdf]

More information? doi>doi>doi>

Digital Object Identifier

doi>

Norman Paskin, International DOI Foundation

• Supplementary material• DOI Application profiles concept• Supporting IDF: benefits• IDF development path • DOI and internet standards

Appendix

DOI Application Profiles h app. profile

Each Profile can be thought of as built from the kernel + extensions:

DOI AP

metadata for application

Compulsory kernel for any DOI

doi>

AP10

Application may be defined in terms of another scheme, e.g. ONIX

DOI TermONIX

doi>Metadata elements h app. profile

AP10

Application defined in terms of another scheme, e.g. ONIX

DOI Term ONIX

doi>

=

AP10

Must have mapping for each element e.g.ONIX “Page” = iid 734 (DOI Term Set)

doi>

doi>

AP10

DOI Term SetONIX

doi>

AP10

DOI users can see metadata as all defined in DOI terms:

doi>

AP10

AP27

The advantage is in additional schemes/mappings:

doi>

AP10

AP27

• Persistent identification – Not just a location – Permanent, trackable, name– Stays the same if ownership, location, control changes– No need to update customers if location changes

• Can incorporate existing identifiers– Standard e.g. ISBN, ISSN, ISMN, SICI, ISRC– Non-standard / public e.g. PII– Private e.g. workflow, internal production – Assigned by the publisher – or on his behalf

• Can interoperate metadata standards – Application profiles, kernel metadata, indecsDD

Benefits of supporting DOIs doi>doi>

• Automated link from DOI to any (and multiple) points – Controlled by the assigner– e.g. Multiple locations; purchase options; additional info;

access control can be made available and controlled globally by the publisher. Can be invoked globally by an intermediary, etc.

• Build your own custom features: entirely extensible architecture

• Generic applicability; any form of intellectual property, any granularity (text, music, audio..)– Simple standard metadata associated with each DOI to

ensure interoperability

• Conforms to, and works with, existing standards

Using DOIs doi>doi>

• Promotes ready use of material in a legal, controllable, manner

• Proven, implemented, real system in use now – e.g. CrossRef: 160+ publishers, around 3 million DOIs per year

since Jan 2001, around 2 million resolutions per month, supports existing businesses

• Demonstrated unique additional features – multiple resolution; DOI-APs– use of these limited only by your imagination

• Low risk– not a proprietary system; available at low cost– controlled by neutral, not-for-profit Foundation with single aim.– built on open standards.– comprehensive effort reduces risk of "dead-end": Asia as well as

EU, US; multimedia e.g. text, music, software

Business benefits doi>doi>

IDF participates in other efforts• W3C, IETF DRM activities• PRISM, ONIX, indecs2…..• ISO TC46, ISO MPEG• NISO, WIPO, etc• Music industry: GRID, CR Forum• Content ID Forum (Japan) • Indecs • TV AnytimeetcNo one company can participate in all these

doi>Leverage other activities doi>

• if this is desirable, it must be paid for • membership supports development until

operating federation takes over• community invests now to get benefit for all • coordinated work to provide efficient operation • ensure consistent deployment and avoid

fragmentation • prevent conflicts and promote efficiency• outreach to other efforts

doi>Why is support needed? doi>

• Ensure the DOI is widely implemented – Existing applications need underpinning of consistent rules,

infrastructure, and wide uptake

• Ensure Content community sets standards – Technology standards are not enough (Napster)– No other existing forum is doing this: W3C, OEBF, MPEG21 etc. all looking

at parts

• DOI results from extensive work by AAP, IPA, STM (1997+) - a consistent development path

• IDF has strong position, and support. – Content and technology communities are represented

• Promote collaboration – interoperate with others; reduce costs, prevent mistakes – provide a common platform but retain ability to build added-value services.

Benefits of supporting IDF doi>doi>

• Cost effective way of gaining access to expertise– Cost is equivalent to 2-3 man days per month of one consultant

(even at highest membership level)– Detailed Monthly briefings on other activities (WIPO, W3c, IETF,

MPEG, ISO, OEBF, SIIA, etc), and more expertise available on request

• Preferential access to business opportunities: – IDF makes connections between members and potential applications:

explore at low risk possible business opportunities– Early access to results of prototypes, plans

• Share cost of development of prototypes– Costs can be shared by participants

• Influence the course of the IDF – participate in working groups, annual meeting, prototypes, board

Benefits of supporting IDF doi>doi>

• An additional business opportunity for some members• Build on the features and acceptance of the system

– build on existing services or offer new services– management of content, management of metadata, etc.

• RAs may build as little or as much as they wish on this– simple assignment, through to a wide range of services

• RAs determine their own fate: – IDF provides federal structure for infrastructure, predictable costs and

governance model – open market structure for applications

• Business opportunity is a shared risk: – DOI service supported by multiple RAs and multiple applications– Shared costs of the infrastructure– common infrastructure encourages common added-value tools

Registration Agencies doi>doi>

ActivitytrackingActivitytracking

Full implementation

Full implementation

Initial implementation

Initial implementation

Single redirection (persistent identifier)

Metadata W3C, WIPO, NISO, ISO, UDDI etc.Multiple resolution

A continuing development activity

DOI: development path doi>

• A number (or “name”)– assign a number to something– (compare: telephone number)

DOI: components

• A number (or “name”)– assign a number to something– (compare: telephone number)

• A description– what the number is assigned to– (compare: directory entry)

DOI: components

• A number (or “name”)– assign a number to something– (compare: telephone number)

• A description– what the number is assigned to– (compare: directory entry)

• An action – make the number do something – (compare: the telephone

system)

DOI: components

• A number (or “name”)– assign a number to something– (compare: telephone number)

• A description– what the number is assigned to– (compare: directory entry)

• An action – make the number do something – (compare: the telephone system)

• Policies– how to get a phone number; billing

(compare: social structures)

DOI: components

“Imagine a country where nobody can identify who owns what, addresses cannot easily be verified, people cannot be made to pay their debts, resources cannot conveniently be turned into money, ownership cannot be divided into shares, descriptions of assets are not standardized and cannot easily be compared, and the rules that govern property vary from neighbourhood to neighbourhood or even street to street. You have just put yourself into the life of a developing country or former communist nation”

doi>Our aim: Building infrastructure

“The Mystery of Capital: Why Capitalism Succeeds in the West and Fails Everywhere Else” by Hernando de Soto (2000)

doi>

“One of the most important things a formal property system does is transform assets from a less accessible condition to a more accessible condition, so that they can do additional work. Unlike physical assets, representations are easily combined, divided, mobilized, and used to stimulate business deals. By uncoupling the economic features of an asset from their rigid, physical state, a representation makes the asset "fungible" - able to be fashioned to suit practically any transaction.”

doi>Our aim: Building infrastructure

“The Mystery of Capital: Why Capitalism Succeeds in the West and Fails Everywhere Else” by Hernando de Soto (2000)

DOI: provide the tools for representations of intellectual property

doi>

• Distinguish two issues:

1. The technical specification of “what is” a URN and a URI

2. What this means for practical implementation

Internet standards: DOI, URN and URL doi>doi>doi>

• See DOI handbook chapter 4– 4.9 DOI as a URI– 4.10 DOI as a URN– equally true of all HDLs – DOIs are HDLs

• Aim: DOIs are persistent across time and unique across network space

• DOIs are URIs (formally draft specification)

• DOIs are URNs (in effect) • URN and URI proponents disagree

– “the URN wars”

1. Internet specs doi>doi>doi>

1. Internet specs doi>

URN URL

URI

Resolution (N2L)

http:// www.w3.org/addressing (But largely from IETF, W3C did not see need for URN)

urn:ftp:gopher:http:

doi>

• IETF formal spec “URI scheme for Digital Object identifier” – Paskin, Norman; Neylon, Eamonn; Hammond, Tony; Sun, Sam; Uniform Resource

Identifier (URI) scheme for Digital Object Identifiers (DOIs); http://www.ietf.org/internet-drafts/draft-paskin-doi-uri-00.txt (February 2002)An abstract specification (uri:doi:)

– Would be doi: (like tel:) [uri: is not part of the uri spec, unlike urn:]

• May be a pure name or de-referenced by any service – The namespace provides its own mechanism

(“Bootstrapping”) • RFC 2396: UTF-8 encoding allows non-Roman characters• On its own, it’s just a specification!• Requires code distribution for any implementation

DOI as URI doi>doi>doi>

• URN is less clear:– Higher level situation muddy– Set of IETF drafts that define URN– Set of registered namespaces (e.g. isbn)

• DOI could be but isn’t- no advantage• Unlike URI, provides a specific DNS-based middle layer (RDS) to find the

appropriate resolution service• Scalability and security questioned; and:

• Little or no resolution implementation – Resolution proposed is one specific way:– NAPTR(Name Authority Pointer) turns urn:hdl:10.1000/1 into

http://hdl.handle.net/10.1000 – Recently DDDS(Dynamic Delegation Discovery System): variant

of NAPTR

DOI as URN doi>doi>doi>

• urn:isbn:123456789 can be defined ; but what does it do over and above isbn:123456789? – neither have a readily available, well known, global, resolution

• What if NAPTR were widely deployed? (5 years on)?

• Some advantage: could redirect from one URL proxy to another– urn:doi to http://dx.doi.org/ redirect to http://dx2.doi.org

• But this is a “regular expression”: not software• And still worries about DNS issues

– “Gratuitous use of DNS”– DNS name servers are widely distributed – inertia– No security of resolution

DOI as URN doi>doi>doi>

• Persistence across time and network space desirable

• Do not want to bet on the URN logic of putting a resolution system in front of resolution systems– Especially the one proposed

• But– DOIs ARE URIs (formally) – DOIs ARE URNs (in effect)

• But: this is not the most important issue!

1. Internet specs doi>doi>doi>

• Irrespective of all this URI/URN specification, DOIs are still needed, still useful, still valid

• A DOI is more than HDL– Adds Policy, business rules, business model– Adds Metadata specifications (cf ISBN, EAN, Visa)

• e.g. Mappings: – Ensures semantic integrity– A technical exercise:– A term is assigned a unique value in the iDD– Given a genealogy and ContextDescription – Other information added– A mapped term becomes part of the dictionary

• Hence will become more useful as it grows– Consensual between the two things being mapped– Painstaking, but once-only– Specialist services requiring intellectual input

2. Practical implementation doi>doi>doi>

• On this topic, see • DOI Handbook Ch. 3.6: Social infrastructure• DOI Handbook Ch. 6: on The Handle System

and using HDL without DOIs • DOI Handbook Ch. 13: on RAs and using DOIs

without RAs

2. Practical implementation doi>doi>doi>

Recommended