34
Signposting for Repositories @mart1nkle1n, @hvdsomp OR 2017, 06/29/2017, Brisbane, AUS Signposting for Repositories Martin Klein @mart1nkle1n http://orcid.org/0000-0003-0130-2097 Herbert Van de Sompel @hvdsomp http://orcid.org/0000-0002-0715-6126 Research Library Los Alamos National Laboratory http://signposting.org Signposting is funded by the Andrew W. Mellon Foundation Cartoon by Patrick Hochstenbach

Signposting for Repositories

Embed Size (px)

Citation preview

Page 1: Signposting for Repositories

Signposting for Repositories

@mart1nkle1n, @hvdsomp

OR 2017, 06/29/2017, Brisbane, AUS

Signposting for Repositories

Martin Klein@mart1nkle1n

http://orcid.org/0000-0003-0130-2097

Herbert Van de Sompel@hvdsomp

http://orcid.org/0000-0002-0715-6126

Research Library

Los Alamos National Laboratory

http://signposting.org

Signposting is funded by the Andrew W. Mellon Foundation

Cartoon by Patrick Hochstenbach

Page 2: Signposting for Repositories

Signposting for Repositories

@mart1nkle1n, @hvdsomp

OR 2017, 06/29/2017, Brisbane, AUS

2

10.1594/PANGAEA.867908

Page 3: Signposting for Repositories

Signposting for Repositories

@mart1nkle1n, @hvdsomp

OR 2017, 06/29/2017, Brisbane, AUS

3

10.1594/PANGAEA.867908

http://dx.doi.org/10.1594/PANGAEA.867908

Page 4: Signposting for Repositories

Signposting for Repositories

@mart1nkle1n, @hvdsomp

OR 2017, 06/29/2017, Brisbane, AUS

4https://doi.pangaea.de/10.1594/PANGAEA.867908

Page 5: Signposting for Repositories

Signposting for Repositories

@mart1nkle1n, @hvdsomp

OR 2017, 06/29/2017, Brisbane, AUS

5https://doi.pangaea.de/10.1594/PANGAEA.867908

Page 6: Signposting for Repositories

Signposting for Repositories

@mart1nkle1n, @hvdsomp

OR 2017, 06/29/2017, Brisbane, AUS

6https://doi.pangaea.de/10.1594/PANGAEA.867908

Page 7: Signposting for Repositories

Signposting for Repositories

@mart1nkle1n, @hvdsomp

OR 2017, 06/29/2017, Brisbane, AUS

7https://doi.pangaea.de/10.1594/PANGAEA.867908

Page 8: Signposting for Repositories

Signposting for Repositories

@mart1nkle1n, @hvdsomp

OR 2017, 06/29/2017, Brisbane, AUS

8https://doi.pangaea.de/10.1594/PANGAEA.867908

Page 9: Signposting for Repositories

Signposting for Repositories

@mart1nkle1n, @hvdsomp

OR 2017, 06/29/2017, Brisbane, AUS

9https://doi.pangaea.de/10.1594/PANGAEA.867908

Page 10: Signposting for Repositories

Signposting for Repositories

@mart1nkle1n, @hvdsomp

OR 2017, 06/29/2017, Brisbane, AUS

10

Problems

• Humans can easily navigate such links i.e.,

• Copy the DOI and resolve it via dx.doi.org

• Determine where the bibliographic resources are

• Interpret the download link for the dataset ZIP

• Search for authors’ names on the web

Page 11: Signposting for Repositories

Signposting for Repositories

@mart1nkle1n, @hvdsomp

OR 2017, 06/29/2017, Brisbane, AUS

11

Problems

• Humans can easily navigate such links i.e.,

• Copy the DOI and resolve it via dx.doi.org

• Determine where the bibliographic resources are

• Interpret the download link for the dataset ZIP

• Search for authors’ names on the web

• Machines can’t do any of this!

Page 12: Signposting for Repositories

Signposting for Repositories

@mart1nkle1n, @hvdsomp

OR 2017, 06/29/2017, Brisbane, AUS

12

HTTP Links

Mark Nottingham (2010) RFC5988: Web Linking.

https://tools.ietf.org/rfc/rfc5988.txt

Page 13: Signposting for Repositories

Signposting for Repositories

@mart1nkle1n, @hvdsomp

OR 2017, 06/29/2017, Brisbane, AUS

13

HTTP Links

Page 14: Signposting for Repositories

Signposting for Repositories

@mart1nkle1n, @hvdsomp

OR 2017, 06/29/2017, Brisbane, AUS

14

HTTP Links

Page 15: Signposting for Repositories

Signposting for Repositories

@mart1nkle1n, @hvdsomp

OR 2017, 06/29/2017, Brisbane, AUS

15

HTTP Links Are Used

curl –I http://dbpedia.org/data/Reykjavik

HTTP/1.1 200 OK

Date: Thu, 27 Oct 2016 04:43:28 GMT

Content-Type: application/rdf+xml; charset=UTF-8

Content-Length: 1210

Link:

<http://creativecommons.org/licenses/by-sa/3.0>

; rel=“license",

<http://dbpedia.org/data/Reykjavik>

; rel="alternate"; type="text/n3",

<http://dbpedia.org/resource/Reykjavik>; rel="describes",

<http://mementoarchive.lanl.gov/dbpedia/timegate/http://dbpedia.org/

data/Reykjavik>

; rel="timegate"

Page 16: Signposting for Repositories

Signposting for Repositories

@mart1nkle1n, @hvdsomp

OR 2017, 06/29/2017, Brisbane, AUS

16

HTTP Link Relation Types

• Registered in IANA registry

• Strings, e.g. license, alternate, describes, timegate

• Requires a formal specification e.g., RFC

• Typically used for common relationships, generically specified

• Provides broad, coarse grained interoperability

• Minted by a community

• URIs e.g., http://xmlns.com/foaf/0.1/primaryTopic

• Requires community agreement

• Can be as specific as desired

• Can provide community-specific, fine grained interoperability

Page 17: Signposting for Repositories

Signposting for Repositories

@mart1nkle1n, @hvdsomp

OR 2017, 06/29/2017, Brisbane, AUS

17

HTTP Links Are Pretty Neat

• Can uniformly be used for all MIME types

• Accessible via HTTP HEAD (no content transfer):

• Works for large resources and for restricted content

• HTTP Links can be conveyed:

• by-value, in the HTTP Link header

• by-reference, by using a linkset link in the HTTP header that

points to a collection of links (1)

• HTTP Links provide guidance to machine agents’ intent on

accomplishing a specific task

(1) Wilde, E. and Van de Sompel, H (2017)Linkset: A Link Relation Type and Media Types for Link Setshttps://datatracker.ietf.org/doc/draft-wilde-linkset-link-rel/

Page 18: Signposting for Repositories

Signposting for Repositories

@mart1nkle1n, @hvdsomp

OR 2017, 06/29/2017, Brisbane, AUS

18

HTTP Links Alternative: Links in Resource

Representation

• Can only be done for media types that support inclusion of typed links i.e., using the HTML <link> element in <head>

• Requires HTTP GET (content transfer)

• HTML <link> is accessible to JavaScript

• For HTML pages, use both HTTP Link and HTML <link>

• Links can be conveyed:

• by-value, using HTML <link>

• by-reference, by using an HTML <link> with the linkset

relation type that points to a collection of links

• HTML <link> elements provide guidance to machine agents’ intent

on accomplishing a specific task

Page 19: Signposting for Repositories

Signposting for Repositories

@mart1nkle1n, @hvdsomp

OR 2017, 06/29/2017, Brisbane, AUS

19

Signposting for Repositories

Proposal:

Use HTTP Links to address some long standing problems regarding

scholarly resources on the web, by interlinking them using

appropriate relation types.

Page 20: Signposting for Repositories

Signposting for Repositories

@mart1nkle1n, @hvdsomp

OR 2017, 06/29/2017, Brisbane, AUS

20

bibliographicresources

constituentresources

HTTPPID

Page 21: Signposting for Repositories

Signposting for Repositories

@mart1nkle1n, @hvdsomp

OR 2017, 06/29/2017, Brisbane, AUS

21

Pattern: Identifier

• Problem: It is not possible to determine the associated HTTP PID of

a scholarly object’s constituent resources

• Landing page URIs used for citation (1)

• Annotations do not refer to HTTP PID

• Solution: provide identifier link pointing at the HTTP PID

• Applies to: landing page, all constituent resources

(1) Herbert Van de Sompel, Martin Klein, and Shawn Jones (2016)

Persistent URIs Must Be Used to Be Persistent.

In: WWW2016. http://arxiv.org/1602.09102

Page 22: Signposting for Repositories

Signposting for Repositories

@mart1nkle1n, @hvdsomp

OR 2017, 06/29/2017, Brisbane, AUS

22

Page 23: Signposting for Repositories

Signposting for Repositories

@mart1nkle1n, @hvdsomp

OR 2017, 06/29/2017, Brisbane, AUS

23

Use HTTP Link with identifier Relation Type

curl -I "https://doi.pangaea.de/10.1594/PANGAEA.867908"

HTTP/1.1 200 OK

Content-length: 8424

Content-type: text/html;charset=UTF-8

Link: <https://doi.org/10.1594/PANGAEA.867908>

; rel="identifier"

Page 24: Signposting for Repositories

Signposting for Repositories

@mart1nkle1n, @hvdsomp

OR 2017, 06/29/2017, Brisbane, AUS

24

Pattern: Publication Boundary

• Problem: It is not possible to determine what the constituent

resources of a scholarly object are

• Preservation and text mining tools require portal-specific

heuristic to find those constituent resources (1)

• No direct path from an HTTP PID to e.g., the PDF

• Solution: provide item/collection links to interlink entry page

and constituent resources; convey MIME types on item links

• Applies to: All constituent resources of a scholarly object

(1) Van de Sompel, H., Rosenthal, D., and Nelson, M.L. (2016) Web Infrastructure to Support e-Journal Preservation (and More)

http://arxiv.org/abs/1605.06154

Page 25: Signposting for Repositories

Signposting for Repositories

@mart1nkle1n, @hvdsomp

OR 2017, 06/29/2017, Brisbane, AUS

25

Page 26: Signposting for Repositories

Signposting for Repositories

@mart1nkle1n, @hvdsomp

OR 2017, 06/29/2017, Brisbane, AUS

26

Use HTTP Link with item/collection Relation Type

curl -I "https://doi.pangaea.de/10.1594/PANGAEA.867908"

HTTP/1.1 200 OK

Content-length: 8424

Content-type: text/html;charset=UTF-8

Link: <https://doi.pangaea.de/10.1594/PANGAEA.867908?format=zip>

; rel="item"

; type="application/zip"

Page 27: Signposting for Repositories

Signposting for Repositories

@mart1nkle1n, @hvdsomp

OR 2017, 06/29/2017, Brisbane, AUS

27

Pattern: Bibliographic Metadata

• Problem: It is not possible to determine where the bibliographic

resources that describes a scholarly object can be found

• Preservation and reference manager tools require portal-specific

heuristic to find those resources (1)

• Solution: provide describedby/describes links to interlink entry

page and bibliographic metadata resources

• Applies to:

• describedby: HTTP PID, landing page

• describes: bibliographic resources

(1) Van de Sompel, H., Rosenthal, D., and Nelson, M.L. (2016) Web Infrastructure to Support e-Journal Preservation (and More)

http://arxiv.org/abs/1605.06154

Page 28: Signposting for Repositories

Signposting for Repositories

@mart1nkle1n, @hvdsomp

OR 2017, 06/29/2017, Brisbane, AUS

28

Page 29: Signposting for Repositories

Signposting for Repositories

@mart1nkle1n, @hvdsomp

OR 2017, 06/29/2017, Brisbane, AUS

29

Use HTTP Link with describedby/describes Relation Type

curl -I "https://doi.pangaea.de/10.1594/PANGAEA.867908"

HTTP/1.1 200 OK

Content-length: 8424

Content-type: text/html;charset=UTF-8

Link:

<https://doi.pangaea.de/10.1594/PANGAEA.867908?format=citation_ris>

; rel="describedby"

; type="application/x-research-info-systems",

<https://doi.pangaea.de/10.1594/PANGAEA.867908?format=citation_

bibtex>

; rel="describedby"

; type="application/x-bibtex"

Page 30: Signposting for Repositories

Signposting for Repositories

@mart1nkle1n, @hvdsomp

OR 2017, 06/29/2017, Brisbane, AUS

30

Pattern: Author

• Problem: It is not possible to uniquely determine who authored the

work

• Solution: provide author link to interlink HTTP PID and author-

identifying URI

• Applies to: HTTP PID, landing page, all constituent resources

Page 31: Signposting for Repositories

Signposting for Repositories

@mart1nkle1n, @hvdsomp

OR 2017, 06/29/2017, Brisbane, AUS

31

Page 32: Signposting for Repositories

Signposting for Repositories

@mart1nkle1n, @hvdsomp

OR 2017, 06/29/2017, Brisbane, AUS

32

Use HTTP Link with author Relation Type

curl -I "https://doi.pangaea.de/10.1594/PANGAEA.867908"

HTTP/1.1 200 OK

Content-length: 8424

Content-type: text/html;charset=UTF-8

Link: <http://orcid.org/0000-0003-1291-8524>

; rel="author"

Page 33: Signposting for Repositories

Signposting for Repositories

@mart1nkle1n, @hvdsomp

OR 2017, 06/29/2017, Brisbane, AUS

33

Take-Aways

• HTTP Links and Relation Types help to:

• Convey the (persistent) identifier of a resource

• Inform about the boundaries of an object

• Point at bibliographic metadata

• Refer to an author-identifying resource

• Increase interoperability of repositories, embrace principles of

the web

• Make repositories more machine-friendly to the benefit of

humans

Page 34: Signposting for Repositories

Signposting for Repositories

@mart1nkle1n, @hvdsomp

OR 2017, 06/29/2017, Brisbane, AUS

Signposting for Repositories

Martin Klein@mart1nkle1n

http://orcid.org/0000-0003-0130-2097

Herbert Van de Sompel@hvdsomp

http://orcid.org/0000-0002-0715-6126

Research Library

Los Alamos National Laboratory

http://signposting.org

Signposting is funded by the Andrew W. Mellon Foundation

Cartoon by Patrick Hochstenbach