52
Digital Object Digital Object Identifier Identifier Charles Ellis: Chairman, International DOI Foundation Norman Paskin: Director, International DOI Foundation Steve Stone: Director, Microsoft eBook Product Group Eric Swanson: Chairman, CrossRef doi >

Digital Object Identifier Charles Ellis: Chairman, International DOI Foundation Norman Paskin: Director, International DOI Foundation Steve Stone: Director,

Embed Size (px)

Citation preview

Digital Object Identifier Digital Object Identifier

Charles Ellis: Chairman, International DOI Foundation

Norman Paskin: Director, International DOI Foundation

Steve Stone: Director, Microsoft eBook Product Group

Eric Swanson: Chairman, CrossRef

doi>

2

Outline

• Background: why DOI• What the DOI system consists of• DOI explained: what it does • Applications

3

Background: why now?

• Identifiers enable us to manage content• Physical world: ISBN, ISSN, ISMN, SICI, etc

• good systems for publishers• Digital world: ? URL?

• poor systems for publishers• how to use existing identifier systems?

• Make WWW transactions as invisible as telephone transactions– machine to machine, – not machine to people to machine

4

The intellectual property background

Digital world enables both use and misuse• Publishers aim is to maximise value of information

objects: - reduce copy infringement and - increase accessibility; - we need to identify in order to manage content• Mass production mass customisation - a la carte/ on demand publishing - components must be clearly identifiable - and rights properties of them automated

5

Background: the organisation • International DOI Foundation: founded 1998

– following demonstration of prototype in 1997

• Not-for-profit; paid membership support– similar principles to World Wide Web Consortium

• Open to all interested parties• Democratic: board elected from members• Full time Director• 35+ organisations (growing)

– Content owners (text publishers, music)– Technology companies– Content intermediaries (etc)

6

DOI: requirements • Identification of content

– intellectual property in any form

• Actionable identification– automation; “click to do something”

• Interoperability– existing identification systems– future developments

• Open standard– compatible with other standards

7

DOI: the aim

• Establish a way of identifying content in the digital environment– actionable identifier

• Which can be the basis of rights management– extensible; can be developed further

8

Components of an identifier

• A number (or “name”)– assign a number to something– (compare: telephone number)

9

Components of an identifier

• A number (or “name”)– assign a number to something– (compare: telephone number)

• A description– say what the number is assigned to– (compare: directory entry)

10

Components of an identifier

• A number (or “name”)– assign a number to something– (compare: telephone number)

• A description– say what the number is assigned to– (compare: directory entry)

• An action – make the number able to do something – (compare: the telephone system)

11

Components of an identifier

• A number (or “name”)– assign a number to something– (compare: telephone number)

• A description– say what the number is assigned to– (compare: directory entry)

• An action – make the number able to do something – (compare: the telephone system)

• Policies– (compare: social /business structures)

12POLICIES

Syntax 10.1234/5678

NUMBERING

DESCRIPTION

MetadataPieces of data which describe uniquely that which is identified

ACTION

ResolutionSystem able to link the number to somethinguseful

13

1. Numbering• DOI syntax: how the number is made up

- NISO standard (Z39.84)

- 10.1000/12345• 10.1000 = prefix (e.g. a publisher, a journal, etc)• 12345 = suffix (combination is unique)

• An opaque string (“a dumb number”)– once assigned, parts of the number do not have separate

meaning

• Permanent – stays the same even if ownership changes

14

2. Description

• “What is numbered?”

• Not as simple as you might think:

1. Not only digital files, but physical things and intangible things!

2. Not only things, but parts of things!

• Let’s explain these:

15

Manuscriptmss #ABC123

paper journal/volume/page

Not only digital things...

16

MS

Vol/page; ISBN; SICI, etc

URL“intangible abstraction”

“intangible abstraction”

17

Not only things, but parts of things

• Components

• Book– Chapter

• Section– Figure

18

Not only things, but parts of things

• Components

• Book– Chapter

• Section– Figure

• “Granularity”

19

Not only things, but parts of things

• Components

• Book– Chapter

• Section– Figure

• “Granularity”

• Must be able to identify at whatever level is appropriate : functional granularity

20

Description is by metadata• Metadata is: Data• Data about other data - Book: ISBN 0864426437 (data) - Price: $12.95 (metadata) - Subject: Buenos Aires (metadata)

• One man’s metadata is another man’s data:

21

Description is by metadata• Data about other data - Subject: Buenos Aires (data) - Book: ISBN 0864426437 (metadata) - Price: $12.95 (metadata)

• Part of an infinite web:

– interconnected – infinite in extent

• inextricable from “identification”

22

Description is by metadata

• Not sufficient to assign an identifier without specifying precisely what the entity is

– “ a paper” or “a book” is not precise enough;

– must be precise, because:

• In an automated world, that specification must be by metadata (able to be used by machines)

• In an interoperable world, that metadata must be

– unambiguous (“well-formed”)

– follow a data model

(able to be used consistently by machines)

23

DOI uses <indecs> framework

Interoperability of data in e-commerce systems• Focus is generic intellectual property management• Enabling, not replacing, other schemes• Broad in scope

– description, transaction, rights

• Based on tested “real world” models, wide support– CIS (music industry); IFLA (library cataloguing)

• Now in use in real applications– Muze (audiovisuals), EPICS/ONIX (books & serials)

• Extensible, structured, open standard

24

DOI metadata is very simple

• A few (7-8) key pieces of data– title, type of content, origin, etc– varies according to what is needed (video, book, etc)

• about the object– does not include rights metadata

• but interoperates with rights data– because based on same data model– uses the same terms to mean the same thing

• analogy: telephone bill = rights information– the telephone number your bank account

25

Web Browser

User

etc.

Actionable identifier

Specified Action

doi>

10.1000/123

3. Actions

26

DOI uses Handle System®

• Open Standard using internet • Distributed, scalable, fast and reliable• In use now in several places (e.g. Lib. of Congress) • Very simple concept, powerful applications• Fits with other standards (URL, URN, etc) • Associates a name with “values” (e.g. URL)

– input DOI– output URL (or some other defined value)

27

Using Handle, DOIs Resolve

to Multiple Data Types

DLS loc/repository

DOI data

10.1004/123456 URLhttp://www.pub.com/.

Handle (DOI) Data type

URLhttp://www.pub2.com/.

Extensible Data Types XYZ1001110011110

INPUT OUTPUT

28

etc.

For convenience we re-draw like this:

URL

URL2

RAP

XYZ

doi>

10.1000/123

INPUT OUTPUT

29

4. Policies

• DOI free to use– costs paid by assigner

• DOI applies to any Intellectual Property entity

– copyright focus (Berne/WCT etc)• Registration agencies to deal with assigning DOIs

(and metadata/resolution) for publishers etc• Business models determined by agencies• Policies for agencies are now evolving

30POLICIES

• Allocation of an identifier (DOI)

ENUMERATION

DESCRIPTION

<indecs> framework allows a DOI to describe any formof intellectual property, at any level of granularity

RESOLUTION

Handle System allows a DOI to resolve to any piece of current data

doi>

31

What is DOI?

Digital Object Identifier

• A unique identifier…. - of a piece of intellectual property

- in any form (tangible, intangible)

- defined by some key metadata

- an opaque string e.g. DOI:10.1000/123

32

What is DOI?

• “resolvable..” - routing, via proven internet technology,

• “to associated state data”…. - one or more current values of

specified types of data (e.g. URL);

- these data may be, or link to, services

33

What is DOI?

• “in an information management substrate…”

- once the (meta)data has been obtained, it can interoperate with other data

- e.g. about context (subscription etc)

- to construct services and transactions

- because (meta)data follows a generic interoperable architecture

34

What is DOI?

“A unique resolvable identifier and multiple pieces of associated state data in an information management substrate” achieved by:

• Technical implementation + policies

• Two underlying technical tools:1. intellectual property: <indecs> framework

2. resolution: Handle System

35

What are the advantages?

1. Identify the item of intellectual property• not its location, because:• if the location changes the identifier should stay the

same (persistence)• the same “resource” can be at several locations at the

same time (“multiple copies”)

DOI does this

36

Web Browser

User

URL

“404 not found”

1. URL is not a persistent identifier - it refers to Location, not content

URL

?

2. Same content at two different URLs has two different identifiers - cannot use as common reference

“...has moved to…”

The problem illustrated on the Web

“One in five Web links more than one year old may be out of date” (Alta Vista)

37

Web Browser

User

URL

1. Don’t change the URL; “persistence is a social, not a technology, problem”

Identifiers on the Web

People do change URLs There are good reasons to change URLs Does not deal with multiple copies

38

URLWeb Browser

User

URL

2. Assign a Name (= identifier) and redirect for “has moved to..”

name

Making identifiers persistent

http Bookmarks and caches save the end point, not the name (in current browsers)

still does not deal with multiple copies

Identifiers on the Web

39

URLWeb Browser

User

3. Assign a Name (DOI) and use a better resolver

doi>

DOI provides name

Identifiers on the Web

URL One point of management

Multiple resolution

40

Web Browser

User

Resolution

1. DOI is a persistent identifier

2. DOI identifies the content, irrespective of the location

doi>

10.1000/123

This is the DOI: initial implementation

URL

41

Web Browser

User

etc.

URLURL

URL2

Data 1

Data 2Actionable identifier

Identifier resolves to any piece of data

doi>

10.1000/123

Full DOI implementation: adding multiple resolution

42

Web Browser

User

etc.

URL2

URL1

URL3

URL4

Multiple resolution for performance: (e.g. D-Lib magazine)

Identifier resolves to all URLs; the first to respond is chosen

doi>

10.1000/123

43

Web Browser

User

etc.

URLURL

URL2

Data 1

Data 2Actionable identifier

Specified Action

doi>

10.1000/123

Service 1 @ 10.1000/123

Multiple resolution for intelligence: “services”

44

2. Able to deal with relationships:– “this item is a manifestation of that work”

– “this item is a part of that item”

DOI does this:

• DOIs can resolve to other DOIs

• Metadata can express relationships– “is part of…” etc

What are the advantages?

45

URL

URL2

Service A

Service B

doi>

URL

URL

Service

doi>

doi>

DOI networks can reflect the complex relationships of publishing

46

3. Apply to any intellectual property entity– any format (digital convergence)– any granularity (any part of something)

4. Enable complex actions – can express relationships between entities– interact with data from other sources – enables services (automated, predictable) to be

constructed

What are the advantages?

47

What are the advantages?

5. Extensible• resolution system has capability for trusted

transactions• metadata framework has capability for full rights

management architecture

6. Not limited to current environments• not just the Web (other Internet applications)• not just digital (intangibles etc)

48

Standardstracking

Standardstracking

Full implementation

Full implementation

Initial implementation

Initial implementation

DOI: development in three tracks

Single redirection

Metadata W3C, WIPO, NISO, ISO, etc, other initiativesMultiple resolution

A continuing development activity

49

Applications

• Reference linking of articles - CrossRef (full scale DOI implementation, not run by IDF);

metadata, single resolution

• E-books – currently being worked on (with ONIX/EPICS)

• Images – BioImage; others

• Books• Audiovisuals • etc.

50

DOI Deployment

• DOI Foundation to provide governance– using a federation of registration agencies – agencies follow agreed rules (policies)

• minimum criteria for registration agencies:– technical; information management; $

• does not prescribe details of individual businesses

• comparable models:– Bar codes (EAN/UPC); Visa; ISBN etc.

51

Summary

• A general purpose identifier system– number, description, action and policies

• Any item, at any desired level – using a metadata framework

• Linking to any service or data– using resolution (multiple resolution)

• Simple to use – registration agencies

• Applications and agencies now happening

52

• DOI background papers & DOI Annual Review, FAQs, gallery, etc– www.doi.org

• <indecs> – www.indecs.org

• Handle system

– www.handle.net• [email protected]

Further information

doi>