63
UKOLN is supported by: Open for Business Open Archives, OpenURL, RSS and the Dublin Core Andy Powell, UKOLN, University of Bath [email protected] Presentation to CABI staff www.bath.ac.u k a centre of expertise in digital information management www.ukoln.ac.u k

Open for Business Open Archives, OpenURL, RSS and the Dublin Core

Embed Size (px)

DESCRIPTION

Presentation to CABI staff - 2005

Citation preview

Page 1: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

UKOLN is supported by:

Open for Business

Open Archives, OpenURL, RSS and the Dublin Core

Andy Powell, UKOLN, University of Bath

[email protected]

Presentation to CABI staff

www.bath.ac.uk

a centre of expertise in digital information management

www.ukoln.ac.uk

Page 2: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

2

Contents

• context – metasearching and open ‘context sensitive’ linking

• bluffer’s guides to…– Dublin Core– OAI Protocol for Metadata Harvesting– RSS– OpenURL

• discussion about the benefits, problems and issues of using these standards in the publishing ‘business’ environment…

Page 3: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

3

Things to note…

• this talk is about technologies…• …but it is not intended to be overly

technical• you should leave with an understanding

of what the key technologies are – but not necessarily be expert in them!

Page 4: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

4

Things to note…

…please feel free to ask questionsas we go through!

Page 5: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

5

Context: Metasearching and context sensitive linking

Page 6: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

6

The ‘problem’…

• end-user often has access to large number of heterogeneous collections - full-text, A&I, images, video, data, etc. (e.g. thru JISC licening agreements)

• however, experience of these collections is less than optimal:– end-users not aware of available content– end-user has to interact with (search or browse)

multiple different Web sites to work across range of content

– content ‘discovery’ services not joined-up with ‘delivery’ services

Page 7: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

7

Or, to put it another way…

• from perspective of ‘data consumer’– need to interact with multiple collections of stuff -

bibliographic, full-text, data, image, video, etc.

– delivered thru multiple Web sites– few cross-collection discovery services (with

exception of big search engines like Google and Google Scholar, but still some issues with use of Google – e.g. the ‘invisible Web’, the lack of metadata, keywords with multiple meanings, etc.)

• from perspective of ‘data provider’– few agreed mechanisms for disclosing

availability of content

Page 8: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

8

A solution…

• an ‘information environment’• framework of machine-oriented services

allowing the end-user to– discover, access, use, publish resources across a range

of content providers– move away from lots of stand-alone Web sites...

• content providers expose metadata for– searching, harvesting, alerting

• develop end-user services and tools that bring stuff together…

• …based on open ‘standards’

Page 9: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

9

End-user services and tools• tend to focus on library portal

(metasearch) tools (e.g. Encompass, MetaLib or ZPortal)

• but, there will be lots of user-focused services and tools…– subject portals developed within academia– reading list and other tools in VLE (e.g. externally hosted

by Sentient Discover)– commercial ‘portals’ (ISI Web of Knowledge, ingenta, Bb

Resource Center, etc.)– SFX service component (or other OpenURL resolver)– personal desktop reference manager (e.g. Endnote)

Page 10: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

10

Link resolvers

• ‘discovery’ is only part of the problem…• in the case of books, journals, journal articles,

end-user wants access to the most appropriate copy

• need to join up discovery services with access/delivery services (local library OPAC, ingentaJournals, Amazon, etc.)

• need localised view of available services• linking services that provide access to the

most appropriate copy– user and institutional preferences, cost, access

rights, location, etc.

Page 11: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

11

A shared problem space

• the problems outlined here are shared across sectors and communities– student or researcher looking for information from variety

of bibliographic sources– lecturer searching for e-learning resources from multiple

learning object repositories– researcher working across multiple data-sets and

compute servers on the Grid– a GP searching the National electronic Library for Health– school child searching BBC, museum and library Web

sites for homework project– someone searching across multiple e-government Sites– even someone looking to buy or sell a second-hand car…

…th

e Com

mon

Info

rmat

ion

Environ

men

t

Page 12: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

12

Technologies• require global, standards-based, cross-

domain solutions…• cross-searching

– Z39.50 – Bath Profile, a profile of Z39.50 SRW (Search and Retrieve Web-service)(Web services implementation of Z39.50)

• harvesting– OAI-PMH - Open Archives Initiative Protocol

for Metadata Harvesting

• alerting– RSS - RDF/Rich Site Summary

• linking– OpenURL

…and cross-domainmetadata

Page 13: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

13

Bluffer’s Guide to…

Dublin Core

Page 14: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

14

Bluffer’s guide to DC

1. DC short for Dublin Core

2. simple metadata standard,supporting ‘cross-domain’resource discovery

3. original focus on Web resources but that is no longer the case – e.g. usage to describe physical artefacts in museums

4. current usage across wide range of sectors – academic, e-government, museums, libraries, business, semantic Web

http://dublincore.org/http://dublincore.org/

Page 15: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

15

Bluffer’s Guide to DC

• ‘simple DC’ provides 15 elements (metadata properties)

• multiple encoding syntaxes including HTML <meta> tags, XML and RDF/XML (XML schema are available)

dc:title dc:contributor dc:source

dc:creator dc:date dc:language

dc:subject dc:type dc:relation

dc:description dc:format dc:coverage

dc:publisher dc:identifier dc:rights

Page 16: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

16

Bluffer’s Guide to DC

7. relatively slow process of adding new terms to ‘qualified DC’– new elements (e.g. dcterms:audience)– element refinements (e.g.

dcterms:dateCopyrighted)– encoding schemes (e.g. dcterms:LCSH and

dcterms:W3CDTF– 48 elements and 17 encoding schemes

http://dublincore.org/documents/dcmi-terms/http://dublincore.org/documents/dcmi-terms/

Page 17: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

17

Bluffer’s Guide to DC

8. DC can be embedded into HTML pages but almost none of the big search engines will use it! Why? Lack of trust…– meta-spam– meta-crap– but, embedding DC in HTML may be

worthwhile if your own site search engine uses it

9. however, simple DC forms baseline metadata format for the OAI protocol…

Page 18: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

18

Bluffer’s Guide to

OAI Protocol for Metadata Harvesting

Page 19: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

19

OAI roots

• the roots of OAI lie in the development of eprint archives…– arXiv, CogPrints, NACA (NASA), RePEc, NDLTD, NCSTRL

• each offered Web interface for deposit of articles and for end-user searches

• difficult for end-users to work across archives without having to learn multiple different interfaces

• recognised need for single search interface to all archives– Universal Pre-print Service (UPS)

Page 20: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

20

Searching vs. harvesting

• two possible approaches to building a single search interface to multiple eprint archives…– cross-searching multiple archives based on protocol like

Z39.50– harvesting metadata into one or more ‘central’ services

– bulk move data to the user-interface

• US digital library experience in this area indicated that cross-searching not preferred approach– distributed searching of N nodes viable, but only for

small values of N

Page 21: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

21

Harvesting requirements

• in order that harvesting approach can work there need to be agreements about…– transport protocols – HTTP vs. FTP vs. …– metadata formats – DC vs. MARC vs. …– quality assurance – mandatory elements,

mechanisms for naming of people, subjects, etc., handling duplicated records, best-practice

– intellectual property and usage rights – who can do what with the records

• work in this area resulted in the “Santa Fe Convention”

Page 22: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

22

Development of OAI-PMH

• 2 year metamorphosis thru various names– Santa Fe Convention, OAI-PMH versions 1.0, 1.1…– OAI Protocol for Metadata Harvesting 2.0

• development steered by international technical committee

• inter-version stability helped developer confidence

• move from focus on eprints to more generic protocol– move from OAI-specific metadata schema to mandatory

support for DC

Page 23: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

23

Bluffer’s guide to OAI

1. OAI-PMH short for Open Archives Initiative Protocol for Metadata Harvesting

2. a low-cost mechanism for harvesting metadata records– from ‘data providers’ to ‘service providers’

3. allows ‘service provider’ to say ‘give me some or all of your metadata records’– where ‘some’ is based on date-stamps, sets,

metadata formats

4. eprint heritage but widely deployed– images, museum artefacts, learning objects, …

http://www.openarchives.org/http://www.openarchives.org/

Page 24: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

24

Bluffer’s guide to OAI

5. based on HTTP and XML– simple, Web-friendly, fast deployment

6. OAI-PMH is not a search protocol– but use can underpin search-based services

based on Z39.50 or SRW or SOAP or…

7. OAI-PMH carries only metadata– content (e.g. full-text or image) made available

separately – typically at URL in metadata

8. mandates simple DC as record format– but extensible to any XML format – IMS metadata,

IEEE LOM, ONIX, MARC, METS, MPEG-21, etc.

Page 25: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

25

Bluffer’s guide to OAI

9. metadata and ‘content’ often made freely available – but not a requirement– OAI-PMH can be used between closed groups– or, can make metadata available but restrict

access to content in some way

10.underlying HTTP protocol provides– access control – e.g. HTTP BASIC– compression mechanisms (for improving

performance of harvesters)– could, in theory, also provide encryption if

required

Page 26: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

26

Bluffer’s Guide to…

RSS

Page 27: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

27

Bluffer’s guide to RSS

1. simple XML application for sharing (syndicating) ‘news’ feeds on the Web

2. RDF Site Summary or Rich Site Summary (depending on who you ask)

3. ‘news’ can be interpreted quite loosely, e.g. new items added to database

4. uses ‘channel’ and ‘item’ terminology

5. a ‘channel’ is an XML document that is made available on a Web-site – to update the channel, simply update the XML

http://www.eevl.ac.uk/rss_primer/http://www.eevl.ac.uk/rss_primer/

Page 28: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

28

Bluffer’s guide to RSS

6. each ‘item’ has simple metadata (title, description) and URL link to resource (news story or whatever)

7. RSS also provides channel branding (logo, etc.)

8. three versions currently 0.9, 1.0 and 2.0 - 1.0 is based on RDF and is more flexible (but slightly more complex)(Also worth noting Atom – an attempt to resolve some of the tensions in RSS)

9. no single registry of all channels yet

Page 29: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

29

Bluffer’s guide to RSS

10. fairly widespread usage, e.g. channels available from the BBC, Microsoft, Apple, … as well as from several academic sites and services (RDN, LTSN, …)

11.easy to use within ‘portals’ (e.g. uPortal)

12. lots of software and toolkits available – open source and commercial

Page 30: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

30

Page 31: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

31

Page 32: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

32

Page 33: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

33

Page 34: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

34

Page 35: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

35

Page 36: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

36

Page 37: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

37

Bluffer’s Guide to…

OpenURLs

Page 38: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

38

OpenURL roots

• the context– distributed information environment (e.g. the JISC IE)

– multiple A&I and other discovery services

– rapidly growing e-journal collection

– need to interlink available resources

• the problem– links controlled by external info services

– links not sensitive to user’s context (appropriate copy problem)

– links dependent on vendor agreements

– links don’t cover complete collection

a libraryperspective

?

Page 39: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

39

The problem

• the context– distributed information environment (e.g. the JISC IE)

– multiple A&I and other discovery services

– rapidly growing e-journal collection

– need to interlink available resources

• the REAL problem– libraries have no say in linking

– libraries losing core part of ‘organising information’ task

– expensive collection not used optimally

– users not well served

a libraryperspective

?

Page 40: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

40

The solution…

• do NOT hardwire a link to a single service on the referenced item (e.g. a link from an A&I service to the corresponding full-text)

• BUT rather– provide a link that transports metadata

about the referenced item– to another service that is better placed

to provide service links

OpenURL

OpenURLresolver

(link server)

Page 41: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

41

Non-OpenURL linking

link destination

resolution of metadata into a link(typically a URL)

A&I servicedocument delivery

service

link source

link to referenced work .reference

Page 42: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

42

OpenURL linking

link source.

user-specific

resolution of metadata &identifiers into services

reference OpenURLOpenURL

resolver

provision of OpenURL

linklink

destination

linklink

destinationlink

linkdestination

linklink

destination

transportation of metadata & identifiers

context-sensitive

A&I servicedocument delivery

service

Page 43: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

43

Example 1

• journal article• from Web of Science to ingenta Journals

Page 44: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

44

button indicatingOpenURL ‘link’

is available

Page 45: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

45

OpenURL resolver offeringcontext-sensitive links,including link to ingenta

Page 46: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

46

Page 47: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

47

also links to other servicessuch as Google search for

related information

Page 48: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

48

Page 49: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

49

Example 2

• book• from University of Bath OPAC to Amazon

Page 50: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

50

button indicatingOpenURL ‘link’

is available

Page 51: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

51

OpenURL resolver offeringcontext-sensitive links,

including link to Amazon

Page 52: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

52

Page 53: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

53

also links to other servicessuch as Google search for

related information

Page 54: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

54

Page 55: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

55

Summary…ISI Web of Science

University of Bath OPAC

OpenURL resolver

ingenta

Google

Amazon

OpenURL SourceOpenURLResolver

OpenURL Target

Page 56: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

56

Summary (2)

• OpenURL source– a service that embeds OpenURLs into its user-

interface in order to enable linking to most appropriate copy

• OpenURL resolver– a service that links to appropriate copy(ies) and other

value added services based on metadata in OpenURL

• OpenURL target– a service that can be linked to from an OpenURL

resolver using metadata in OpenURL

Page 57: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

57

Bluffer’s guide to OpenURLs

1. standard for linking ‘discovery’ services to ‘delivery’ services

2. supports linking from OpenURL ‘source’ to OpenURL ‘target’ via OpenURL ‘resolver’

End-user

source resolver target

e.g. Web of Science e.g. ingenta

http://www.bath.ac.uk/openurl?genre=article&atitle=Information%20gateways:%20collaboration%20on%20content &title=Online%20Information%20Review &issn=1468-4527&volume=24&spage=40&epage=45 &artnum=1&aulast=Heery&aufirst=Rachel

BASEURL

http://www.niso.org/committees/committee_ax.htmlhttp://www.niso.org/committees/committee_ax.html

Page 58: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

58

Bluffer’s guide to OpenURLs

3. the OpenURL is a URL that carries metadata from the ‘source’ service to the user’s preferred resolver

4. resolver typically offered by institution

5. currently deployed OpenURLs are often version 0.1 - focus on bibliographic resources (books and journal articles)

6. version 1.0 (the standard) – more generic and extensible, e.g. could carry metadata about learning objects or research data

Page 59: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

59

Bluffer’s guide to OpenURLs

7. ‘sources’ need to maintain knowledge about end-user’s preferred resolver

8. resolvers and targets need to share knowledge about ‘link-to’ syntaxes

9. most library automation vendors will either have (or be developing) an OpenURL resolver solution for their customers

10.some open-source solutions also available – but expect to work quite hard with these

Page 60: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

60

Discussion…

Page 61: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

61

Summary

Page 62: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

62

Summary

• protocols presented here fill space between ‘information providers’ and other services (‘portals’, VLEs, etc.)– allow integration of remote information

resources more seamlessly– allow separation of ‘discovery’ and ‘content

delivery’– enable user-focused, context-sensitive linking– can be viewed as ways of getting users to your

site

• but… there are some issues to beware of

Page 63: Open for Business  Open Archives, OpenURL, RSS and the Dublin Core

                                                             

63

What can you do?

• consider exposing metadata about your content for harvesting (or searching)

• consider making ‘alerting’ channels available• consider supporting use of OpenURLs for

linking to appropriate-copy• consider how your content will be used in e-

learning context• consider how external services ‘link to’ your

resources (i.e. support persistent deep linking to your content)