112
An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston UKOLN, University of Bath Bath, BA2 7AY UKOLN is supported by: [email protected] http://www.ukoln.ac.uk/

An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Embed Size (px)

Citation preview

Page 1: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

An introduction to metadatafor libraries, museums and archives

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

Pete Johnston

UKOLN, University of Bath

Bath, BA2 7AY

UKOLN is supported by:

[email protected]://www.ukoln.ac.uk/

Page 2: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Section 1 : An Introduction to Metadata

Page 3: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

3

An introduction to Metadata

• Memory institutions, network services and metadata

• What is metadata?• Exposing/sharing metadata• Exposing/sharing metadata :

semantics– the Dublin Core Metadata Initiative

Page 4: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Memory institutions, network services and

metadata

Page 5: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

5

Memory institutions

Museums, libraries and archives—often called memory institutions—are trusted organizations that collectively document the entire range of human experience and expression.

Memory institutions are engaged in the important work of:

• Capturing, authenticating, and making sense of cultural memory;

• Preserving the human record for future generations; and

• Sharing knowledge to support education and learning.

http://www.ukoln.ac.uk/interop-focus/ccs/positions/ http://www.ukoln.ac.uk/interop-focus/ccs/positions/

Page 6: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

6

Delivering services

• Memory institutions provide services to users– (At least some of) these services provide access to

resources

• Emergence of built on global networks– remote access to digital resources for all

(potentially…)– resources available “round the clock” – resources comparable to other digital resources

from elsewhere

• Investment in – digitisation of cultural content– network services providing access to digitised

content

Page 7: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

7

Delivering services

• Potential for new types of service– “digital libraries”, “virtual museums” etc– integrated access to resources from multiple remote

content providers – services defined by theme/subject/activity/audience

etc, not by location/source – “packaging” and re-purposing of content– user-oriented rather than provider-oriented

• Changing user expectations– user wants information relevant to task/activity

– may see structural/organisational boundaries of content providers as unimportant!

– user wants access from any location– user wants access at any time

Page 8: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

8

Delivering services

• Move from web sites to “portals”– “A network service that provides a personalised,

single point of access to a range of heterogeneous network services, local and remote, structured and unstructured”

– Andy Powell, 2002

• Content providers exposing content for delivery through multiple services, channels

• Presentation services “surfacing” content from multiple (distributed) sources

• Memory institutions may perform both roles• Move away from “silo mentality” towards

more “joined-up” approaches

Page 9: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

9

Resource discovery on the Web

• Broadly two approaches to providing discovery services

– software indexing of resource content– human description of resources

• Web search engines– software agents (robots) retrieve documents by

following hyperlinks (crawling)– index text of documents– make index available as searchable database– some clever ranking algorithms

– e.g. Google infers “Page Ranking” based on links to document

– “find pages which link to page X”– “find pages similar to X”

Page 10: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

10

Resource discovery on the Web

• Web search engines– tend to generate many results

– and may suffer from “spamming” – ranking algorithms may help

– don’t support “structured search”– search on author name– search on document type (“journal article”)

– limited to textual resources– generally, poor support for search for multimedia

objects

• “The hidden Web”– robots may not crawl documents dynamically

generated from databases/CMS

Page 11: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

11

Resource discovery on the Web

• But automated indexing – is low cost

– At least compared to human resource description

– (usually) scales to large numbers of resources

– can be a useful tool!

• Challenge of finding appropriate balance of approaches for context

Page 12: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

12

Metadata for services

• Metadata has been important to “traditional” service provision…

• … is essential component of effective network services

Page 13: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

What is metadata?

Page 14: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

14

What is metadata?

• Simple definitions…• ‘Structured data about data’.

– Dublin Core Metadata Initiative FAQ, 2003

• Machine-understandable information about Web resources or other things.

– Tim Berners-Lee, W3C, 1997

Page 15: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

15

Towards a “functional” view of metadata

• Data associated with objects which relieves their potential users of having to have full advance knowledge of their existence or characteristics. A user might be a program or a person.

– Lorcan Dempsey & Rachel Heery, 1998

• Structured data about resources that can be used to help support a wide range of operations

– Michael Day, 2001

Page 16: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

16

What resources, objects, things?

• HTML documents• digital images• databases• books• museum objects• archival records• metadata records

• Web sites• collections• services• physical places• people• institutions• abstract “works”• concepts• events

• Metadata might exist for almost anything– digital, physical, “abstract” resources

Page 17: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

17

What resources, objects, things?

• Metadata records include– bibliographic records in library catalogues or from

abstracting & indexing services– descriptions of archival material in archival finding

aids – object records in museum documentation /

collection management systems– entries in directories of organisations, individuals

and services– descriptions of digital objects (documents, images,

software)– descriptions of collections of digital objects– descriptions of network services– descriptions of metadata records

Page 18: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

18

What operations?

• Operations by human users, software tools • Metadata might be used to support many

different functions– resource disclosure & discovery– resource management, including preservation– intellectual property rights management– commerce– authentication and authorisation– personalisation and localisation of services

• Different functions require different types/classes of metadata

– No “one size fits all solution”– Need to specify “functional requirements”

Page 19: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

19

Metadata elements & element sets

• Metadata describes attributes or properties of a resource

• Each attribute or property is described by a metadata element

– Can be identified, formally documented/defined– May be represented in different forms

• A metadata element set– coherent bounded set of elements formulated as

basis for metadata creation– created for purpose, as a unit

• Schema– structured representation of an element set

Page 20: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

20

Metadata for resource discovery

• User wishes to1. discover resources according to some criteria2. (optionally) identify a specific resource

– confirm that resource described is resource sought– distinguish similar resources

3. select– evaluate, choose resource appropriate to needs

4. locate resource5. obtain/access resource6. use resource

– open, read, display, run, play, copy, unpackage/repackage

– interpret content

• Resource discovery metadata supporting (primarily) operations 1 - 4

Page 21: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata for resource discovery

full-text indexes might not be classed as “metadata” by some!

generated by software tools

discovery (by content), location

semantically simple forms(e.g. Dublin Core)

typically covering description of broad range of resources

maybe part generated automatically, partly human authored

discovery, identification, selection, location

richer complex forms(e.g. MARC, EAD,CIMI-SPECTRUM, AMICO etc)

typically covering specific types of resources

often associated with particular community/domain

creation may involve relatively high degree of human expertise

discovery, identification, selection, location, access, use (which may be type specific)

Continuum of complexity/functionality

Page 22: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

22

Association of resource and metadata (1)

Resource1

e.g. meta elements in HTML docs; summary properties in word processor docs

Can resource support embedding of metadata?

Does metadata creator have write access to resource?

Can service extract embedded metadata?

Metadata about aggregates of resources?

Metadata about people, places, concepts?

Creator = J Smith

Date = 2001-11-05

Title = Report

Metadata embedded in resource

Page 23: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

23

e.g. link elements in HTML docs

Metadata record may be remote from resource

Can resource support embedding of link?

Does metadata creator have write access to resource?

Can service follow link to metadata record?

What happens when resource deleted?

Metadata about aggregates of resources?

Metadata about people, places, concepts?

Resource1

Metadata rec 1

Metadata rec = 1

Creator = J Smith

Date = 2001-11-05

Title = Report

Metadata record as separate objectRecord identifier embedded in resource

Association of resource and metadata (2)

Page 24: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

24

Metadata record may be remote from resource

Does not require embedding of metadata or link

Does not require metadata creator to have write access to resource

Metadata record created independently of resource – possibly multiple records

Service uses metadata records independently of resource

Metadata record may persist after resource deleted

Metadata record can describe anything (with identifier…)Resource1

Metadata rec 1

Creator = J Smith

Date = 2001-11-05

Title = Report

Doc = 1 Metadata record as separate objectResource identifier in metadata record

Association of resource and metadata (3)

Page 25: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

25

J Smith 2001-11-05 Report

Creator Date TitleDoc

1

Metadata record is used separately from resource described

Recognition that metadata is resource to be managed, separately from resource described

Metadata content stored in “database”, exposed in form(s) appropriate for service(s)

Metadata as managed resource

Page 26: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Exposing/sharing metadata

Page 27: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

27

How is metadata exposed/shared?

• Resource description “communities”– characterised by consensus on conventions for

internal exchange of metadata

• Metadata for resource discovery – is used beyond its creator community– is combined/compared with metadata from other

communities– is aggregated or cross-searched by services

• How does a content provider make metadata records available in a commonly understood form?

• How does a service provider obtain these metadata records from data providers?

Page 28: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

28

How is metadata exposed/shared?

• Effective sharing of information expressed in metadata record requires agreement on

– metadata semantics– what metadata elements mean

– metadata structure– data model, relationships of component parts

– metadata syntax– rules of expression

– protocols– how metadata records transmitted between

content provider and service provider

• Agreements formalised as specifications and standards (ideally…)

Page 29: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Exposing/sharing metadata :semanticsIntroducing the Dublin Core

Page 30: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

30

Introducing the Dublin Core

• Initiative to improve resource discovery on Web

– not for complex resource description– based on description of simple “document-

like objects”– extended to other classes of resource

• International, cross-disciplinary consensus on simple element set

– 15 elements– all optional– all repeatable

http://dublincore.org/ http://dublincore.org/

Page 31: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

31

Introducing the Dublin Core (2)

• Title• Subject• Description• Creator• Publisher• Contributor• Date

• Type• Format• Identifier• Source• Language• Relation• Coverage• Rights

Page 32: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

32

Dublin Core: creator

• Term Name: creator• Label: Creator• Definition: An entity primarily responsible for making

the content of the resource.• Comment: Examples of a Creator include a person, an

organisation, or a service. Typically, the name of a Creator should be used to indicate the entity.

• Type of Term: element• Status: recommended• Date issued: 1999-07-02• URI: http://purl.org/dc/elements/1.1/creator

Page 33: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

33

Dublin Core: date

• Term Name: date• Label: Date• Definition: A date associated with an event in the life

cycle of the resource.• Comment: Typically, Date will be associated with the

creation or availability of the resource. Recommended best practice for encoding the date value is defined in a profile of ISO 8601 [W3CDTF] and follows the YYYY-MM-DD format.

• Type of Term: element• Status: recommended• Date issued: 1999-07-02• URI: http://purl.org/dc/elements/1.1/date

Page 34: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

34

Standardisation of Dublin Core

CEN Workshop Agreement (EU) • 2000: Dublin Core elements endorsed as

CWA13874 • Usage guidelines for European industry

NISO Z39.85 (USA)• 2001: National Information Standards

Organization, an ANSI affiliate

ISO• 2002: Dublin Core Metadata Element Set

approved as ISO 15836

Page 35: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

35

Using the Dublin Core

• Tom Baker, “ A Grammar of Dublin Core”, Dlib, October 2000

• Metaphor of metadata as language• DC as a simple “pidgin” language for use by

“tourists on the Internet commons”• Small vocabulary, simple grammar/structure

– This Resource has Title “An introduction to metadata”

– This Resource has Subject “Resource discovery”

• Not subtly expressive, but easy to learn and deploy - “good enough” to work

Page 36: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

36

Using the Dublin Core

• Designed for simplicity of semantics, ease of use

• Provides basic semantic interoperability

– semantics sufficiently general to be useful across domains

• Can provide 15 “windows” into richer resource descriptions

– disclose rich description in simple form– semantic cross-walks, mappings

Page 37: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

37

Using the Dublin Core

title

creator

date

desc

rights

Rich description

Simple DC description

Page 38: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

38

Qualifying Dublin Core

• Allows for controlled extensibility through “qualifiers”

– Element refinements– make element meanings narrower, more specific:

– a Date Created versus Date Modified

– an IsReplacedBy versus Replaces Relation

– Encoding schemes– provide contextual information or parsing

rules that aid in the interpretation of a value– may specify that a value is drawn from a

controlled vocabulary (e.g. LCSH, TGN etc)– may specify that a value is formatted in

accordance with a specified notation (e.g. date formats)

Page 39: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

39

Qualifying Dublin Core

• Qualifiers make elements more specific– Element Refinments narrow meanings, never

extend– Encoding Schemes give context to element values

• The “dumb-down” rule– Application should be able to use the value as if it

were unqualified– Ignore unknown Encoding Schemes– Resolve (semantically more specific) Element

Refinements to (more generic) Elements

• Some loss of specificity, but still generally correct and useful for discovery

Page 40: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

40

Dublin Core: valid

• Term Name: valid• Label: Valid• Definition: Date (often a range) of validity of a

resource.• Type of Term: element-refinement• Status: recommended• Date issued: 2000-07-11• URI: http://purl.org/dc/terms/valid

Page 41: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

41

Using the Dublin Core

• Not a replacement for richer descriptive standards

• But useful– If you wish disclose community-specific

metadata to other communities using commonly understood semantics

– If you wish to provide integrated access to your own metadata databases with different underlying semantics

– If you only need simple metadata semantics

Page 42: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

42

Using the Dublin Core

• Inherent tensions in DC– Broad, fuzzy “search buckets” or rigidly prescribed

usage?– Generic applicability across domains or intra-

domain precision?– One-size-fits-all or customise-as-you-please?– Simply discovering resources (a few typical search

attributes) or describing them fully (lots of detail)?– Dublin Core primarily as a native record format or

extracted from richer metadata?– Broad-brush minimalism or comprehensive

structuralism?

Page 43: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

43

Summary

• Emergence of global networks enable new approaches to providing access to resources

– Increasing requirement to provide resource discovery across boundaries

• Metadata supports many functions, including resource discovery

• DC as simple, cross-disciplinary metadata element set

• Next:– How metadata records are represented:

syntax/structure– How metadata records are exposed/shared/used

in resource discovery services

Page 44: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Section 2 : Sharing metadata: XML and the OAI Protocol for Metadata

Harvesting

Page 45: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

45

Sharing metadata : XML and OAI

• Exposing/sharing metadata: syntax and structure

– Extensible Markup Language (XML)– XML Schema

• Metadata harvesting– The Open Archives Initiative Protocol for

Metadata Harvesting

• Some OAI-based services• Developing metadata-based services

Page 46: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Exposing/sharing metadata : syntax and structureXML & XML Schema

Page 47: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

47

Embedding DC metadata in (X)HTML

• Dublin Core metadata can be embedded into (X)HTML documents

– Simple to deploy but may be difficult to manage, maintain

• But almost none of the Web search engine services index it

• Lack of trust in “open” Web context– Abuse by content providers seeking to improve the

ranking of their documents

• However, may be useful technique in “closed” context

– e.g. single Web site or where control over which documents indexed

Page 48: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

48

Embedding DC metadata in (X)HTML

<html xmlns="http://www.w3.org/1999/xhtml">

<head>

<link rel="schema.DC" href="http://purl.org/dc/elements/1.1/" />

<meta name="DC.Title" lang="en" content="Expressing Qualified Dublin Core in HTML/XHTML meta elements" />

<meta name="DC.Creator" content="Andy Powell, UKOLN, University of Bath" />

<meta name="DC.Date.Issued" scheme="W3CDTF" content="2002-09-09" />

<meta name="DC.Identifier" scheme="URI" content="http://dublincore.org/documents/dcq-html/" />

<meta name="DC.Format" scheme="IMT" content="text/html" />

<meta name="DC.Type" scheme="DCMIType" content="Text" />

</head>

<body>

</body>

</html>

Page 49: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

49

Introducing XML

• Extensible Markup Language– Recommendation of W3C, 1998, 2000

• Defines means of describing tree-structured data in text-based format

– embedded markup delimits and describes data

• Simple, platform-independent syntax• Standard programming interfaces

– reusable software components

• Support from major software vendors• Widely adopted for transferring data between

programs, systems

Page 50: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

50

<table>

<record>

<doc>1</doc>

<creator>J Smith</text>

<date>2001-11-05</date>

<title>Report</title>

</record>

</table>

J Smith 2001-11-05 Report

Creator Date TitleDoc

1 record

title

Report

creator

J Smith

date

2001-11-05

table

record

doc

1

Page 51: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

51

Creator Date TitleDoc

<record>

...

</record>

<record>

...

</record>

Serialisation

Transmission

De-serialisation

Remote application

Page 52: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

52

XML and interoperability

• “Meta-language”– language for describing markup languages– can define unlimited number of markup languages

• But….– XML says nothing about what your names mean– will a software agent process my <doc> XML

element correctly?

• Interoperability requires consensus on– the names of components (XML elements and

attributes)– the structural model of a class of document:– the semantics represented by the components and

the structure

• Shared use of common XML “schemas”

Page 53: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

53

XML schemas

• Means to codify syntax/structure rules for class of XML document

– what markup is allowed– structural constraints on use of markup

• Document Type Definition (DTD)– part of XML Recommendation

• W3C XML Schema– W3C recommendation– data-typing i.e. tighter control on element content– support for XML Namespaces– uses XML syntax

• Software can validate instance against DTD/schema

Page 54: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata harvesting:The Open Archives Initiative Protocol for Metadata Harvesting

Page 55: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

55

Searching & harvesting

• Resource discovery services operating across the resources of multiple distributed content providers

• Possible strategies– Distributed search

– submit parallel queries to multiple metadata databases

– collate multiple result sets for presentation to user

– Harvest– gather metadata records from multiple providers into

single database– (periodic re-gathering to refresh data)– query central database

• Performance issues in cross-searching

Page 56: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

56

Introducing OAI

• Open Archives Initiative– develops/promotes interoperability standards to

facilitate dissemination of content– roots in “e-prints” community seeking to improve

access to scholarly publications– Deposit pre-prints – for quicker dissemination

– Deposit post-prints – to reduce institutional costs, maximise impact

– e-print “archives”– institutional

– federated subject/discipline-based

– required simple low-cost interface to expose metadata for reuse

http://www.openarchives.org/ http://www.openarchives.org/

Page 57: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

57

Introducing OAI (2)

• Terminology– “Archive” = repository, not archive– “Open” in terms of architecture, not free/unlimited

access to repository

• Protocol for Metadata Harvesting (OAI-PMH)– Developed by international technical committee,

1999-2002– Shift from “optimising discovery of e-prints” to more

generic resource discovery– OAI “committed to version 2.0 as a production

release”

Page 58: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

58

Introducing OAI PMH

• Lightweight, low-cost protocol which allows data providers to expose metadata records for retrieval by service providers

• Service providers can say “give me all/some of your metadata records”

• Built on HTTP, XML– Six verbs: requests from service provider to data

provider sent using HTTP GET/POST– responses from data provider to service provider

as XML documents

• Not a distributed search protocol• Not limited to e-print archives

Page 59: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

59

Introducing OAI PMH (2)

• Supports transfer of metadata records– resources made available separately– identifier/locator of resources typically included in

metadata record

• Data provider must provide simple/unqualified DC metadata record

– may provide metadata records in other “formats”– metadata formats must be associated with a W3C

XML Schema

• Extensible framework for metadata about– repository, sets, records

• Metadata and resources often freely available– but not a requirement

Page 60: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

60

Introducing OAI PMH (3)

• Supports selective harvesting– by sets– by datestamps

• Example– Service Provider: List all records added since Jan

1 2002 in simple DC format (oai_dc)– verb = ListRecords– from = 2002-01-01– metadataPrefix = oai_dc– http://www.myarchive.org/cgi-bin/oai?verb=ListRecords&from=2002-01-01&metadataPrefix=oai_dc

– Data Provider: Returns XML document containing records

Page 61: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

61

Resources

Metadata

Website

Resources

Metadata

Website

DC PortalWebsite

PortalWebsite

PortalWebsite

DC

OAI-PMH

OAI-PMH

OAI-PMH

OAI-PMH

Page 62: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

62

OAI DC metadata record (from Library of Congress Repository 1)

<oai_dc:dc>

<dc:title>Empire State Building. [View from], to Central Park</dc:title>

<dc:creator>Gottscho, Samuel H. 1875-1971, photographer.</dc:creator>

<dc:date>1932 Jan. 19</dc:date>

<dc:type>image</dc:type>

<dc:type>two-dimensional nonprojectible graphic</dc:type>

<dc:type>Cityscape photographs.</dc:type>

<dc:type>Acetate negatives.</dc:type>

<dc:identifier>http://hdl.loc.gov/loc.pnp/gsc.5a18067</dc:identifier>

<dc:coverage>United States--New York (State)--New York.</dc:coverage>

<dc:rights>No known restrictions on publication.</dc:rights>

</oai_dc:dc>

Page 63: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Some OAI based services

Page 64: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

64

Resource Discovery Network (RDN)

• Co-operative network of “subject gateways”– Funded by JISC for HE and FE

• Seven “hubs”– ALTIS - Hospitality, Leisure, Sport and Tourism– BIOME: Health and Life Sciences– EEVL: Engineering, Mathematics and Computing– GESource: Geography and Environment– Humbul: Humanities– PSIgate: Physical Sciences– SOSIG: Social Sciences, Business and Law

• Databases of metadata records describing Internet resources selected for high quality

http://www.rdn.ac.uk/ http://www.rdn.ac.uk/

Page 65: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

65

Resource Discovery Network (RDN)

• Hubs as subject communities– metadata creators are subject specialists– good links with users– separate metadata schemas

• Hubs provide their own Web interfaces– search databases– other services: tutorials, guides, alerting etc

• But operate within a shared policy framework– collection development– cataloguing guidelines– technical standards– agreements on IPR

Page 66: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

66

Resource Discovery Network (RDN)

• RDN Resource Finder – Cross-search of Hubs’ metadata records– Initially distributed search using Z39.50

– Performance issues– Difficult to build flexible browse interface

– Now using OAI PMH to harvest records– Currently harvesting simple DC– Basic keyword searching– Exploring harvesting some richer record formats for

additional functionality

• Also some sharing of metadata– between Hubs (DC plus extensions)– between Hubs and other similar services (LOM)– but Hubs’ metadata not freely available for harvest

Page 67: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

67

Resource Discovery Network http://www.rdn.ac.uk/

Resource Discovery Network http://www.rdn.ac.uk/

Page 68: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

68

e-Prints UK

• JISC-funded project, 2002-2004• Provide access to e-prints via subject-based

RDN services• Harvest metadata from e-print archives

– institutional, non-institutional, personal

• Automatically enhance harvested metadata (using Web Services)

– Add (or validate) authoritative forms of author names (OCLC)

– Assign subject classification (based on analysis of full-text of resource) (OCLC)

– Generate OpenURLs from citations (based on analysis of full-text of resource) (Univ of Southampton/UKOLN)

http://www.rdn.ac.uk/projects/eprints-uk/ http://www.rdn.ac.uk/projects/eprints-uk/

Page 69: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

69

e-Prints UK

• Provide search services– across all metadata– subject-partitioned search services for Hubs

• Enhanced metadata records made available to originating e-print archive

• Note– service provider enhancing harvested metadata to

provide more functionality– some of enhancement process requires access to

resource as well as metadata record– two-way flow of metadata records– recommendations for how to use simple DC to

describe e-prints to maximise benefits of metadata disclosure

Page 70: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

70

e-Prints UK

e-Prints UK

RDNgateway/portal

service

RDNgateway/portal

service

RDNgateway/portal

service

Subjectclassification

service

Nameauthorityservice

Citationanalysisservice

Institutionale-printarchives

Personale-printarchives

OAI-PMH

SOAP

Non-institutionale-printarchives

SOAPJavascript/HTTPZ39.50

Web servicesofferedby OCLC

Web serviceofferedby Southampton

e-print archives

end-user services thru the RDN

Page 71: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Developing metadata-based services

Page 72: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

72

Developing services

• Consensus on metadata semantics/syntax, transport protocols etc as minimal requirements

• Resource selection– collections policies

• Metadata quality assurance– “cataloguing rules”

– mandatory elements, minimum-level records– guidance on content of values of elements: formats,

controlled vocabularies, identifiers etc

– Maintenance, currency of metadata

• Agreements on IPR, usage rights, “branding”– for metadata records as well as resources

Page 73: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

73

Developing services

• DCMES intended to be simple enough for creation by untrained creators

– assumption that metadata creation straightforward?

• Recognition that precision in services depends on quality of metadata

• Subject terms/classification difficult for non-expert

• Different services providing different functionality to different audiences may require different metadata

Page 74: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

74

Developing services

• Human creation of metadata is not cheap! • Where possible, use automated methods to

– Generate metadata– Normalise/enhance metadata

• Service providers as well as data providers can contribute (e.g. e-prints UK)

• Reuse/repurpose metadata• Where human creation required, provide

support– Education, guidelines– Appropriate software tools

Page 75: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

75

Developing services

• Service developers use/implement metadata standards in pragmatic way

• Standards creators concerned with– Consensus, commonality, interoperability– e.g. DCMES

• Implementers concerned with– Functionality, specificity, localisation– e.g. “Using simple DC to describe e-Prints”

• “Application profile”– A metadata element set optimised for a particular

application

Page 76: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

76

Summary

• Standards for metadata semantics• XML as syntax for metadata exchange, but

requires consensus on structures• Harvesting model as alternative to distributed

search– OAI PMH

• Service provision– metadata quality– rights issues – application profiles

• Next:– A common framework for metadata?– Towards the “Semantic Web”?

Page 77: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Section 3 : Sharing metadata: RDF and

the Semantic Web

Page 78: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

78

Sharing metadata: RDF & the Semantic Web

• Is there a problem?• The vision of the “Semantic Web”• Introducing RDF• Some RDF applications

Page 79: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

79

The problem with XML?

• XML as a mechanism for expressing tree-structured data

• Different communities make different design choices for the meaning of their trees

– All “good” (and valid v XML DTD/Schema)

• Within resource description community, meaning(s) of structure(s) may be limited

• But applications working across communities have to work with multiple XML trees

– potentially unlimited – not scalable in an “open” Web environment?– how to manage ever increasing set of conventions– always encountering new structures/schemas

Page 80: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

80

The “Semantic Web”

• Activity of World Wide Web Consortium (W3C)

• To make data available on the Web in a form which is easier for machines to to process

– Machine-processable statements about all kinds of things (Web pages, organisations, people, concepts, products, etc) and the relationships/links between them

• To share data between programs and systems designed independently

– Unlock the data held in databases– Link data from different sources– To enable richer more flexible services

http://www.w3.org/2001/sw/ http://www.w3.org/2001/sw/

Page 81: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

81

The “Semantic Web”

• Builds on – use of Uniform Resource Identifiers

(URIs) to uniquely identify resources– the Resource Description Framework

(RDF) as a common model for expressing information about resources

– an XML syntax for representing RDF data– existing Web protocols (HTTP) for

transferring data

Page 82: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Introducing RDF

Page 83: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

83

Introducing RDF

• Resource Description Framework– Model & Syntax, W3C Recommendation, 1999– RDF Core WG activity, 2001-2003

• Set of revised/expanded specifications currently (April 2002) in “last call”

– Semantics: formal model– Concepts: abstract syntax (graph)– RDF/XML syntax: conventions for encoding

statements using XML– Test Cases– Vocabulary Description Language– Primer: introduction

http://www.w3.org/RDF/ http://www.w3.org/RDF/

Page 84: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

84

Introducing RDF (2)

• Provides generic framework for representing information about resources

– set of conventions/infrastructure for applications exchanging metadata

– allows semantics to be defined by different resource description communities

– accommodates mixing of information from diverse sources

• Resource : any object identified by URI– not necessarily accessible via Web

• Property : “attribute” to describe resource– properties also uniquely identified by URI

• Statement : “triple” of specific resource, property, and value

Page 85: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

85

The RDF model

http://example.org/doc/1author

John

A resource has some property whose value is either (i) a simple string value (literal)…

• The resource identified by the URI http://example.org/doc/1 has a property “author” whose value is “John”

• Or, “John” is the “author” of the resource identified by http://example.org/doc/1

Page 86: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

86

The RDF model (2)

… or (ii) another resource...

http://example.org/doc/1author

John [email protected]

name email

• The value of property “author” is another resource which has a property “name” with value “John” and a property “email” with value “[email protected]

Page 87: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

87

The RDF model (3)

… which may itself have a URI

http://example.org/doc/1

author

John

http://example.org/person/john

[email protected]

name email

Page 88: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

88

The RDF model (4)

Properties themselves are identified by URIs

http://example.org/doc/1

http://example.org/author

John

http://example.org/person/john

[email protected]

http://example.org/name http://example.org/email

Page 89: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

89

The power of the RDF model

• Extensible model– supports any vocabularies

• Supports arbitrary complexity of description• URIs as unique “fixed points” to identify

– resources– properties

• Descriptions created independently can be “merged” using URIs as “anchors”

– i.e. supports distributed metadata

Page 90: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

90

First source

http://example.org/doc/1

author

John

http://example.org/person/john

[email protected]

name email

Page 91: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

91

Second source

http://example.org/doc/1subject

XML

Page 92: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

92

Third source

http://example.org/person/john

organisation

JS Foundation

Page 93: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

93

http://example.org/person/john

organisation

JS Foundation

http://example.org/doc/1

author

John

http://example.org/person/john

[email protected]

name email

http://example.org/doc/1

subject

XML

Three descriptions merged

Page 94: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

94

A simple DC metadata record (the “hedgehog”)

http://example.org/doc/1

dc:subject

dc:type

dc:title

dc:creatordc:contributor

dc:coverage

dc:rights

dc:relation

dc:format

dc:identifier

dc:datedc:description

dc:source

dc:language dc:publisher

Page 95: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

95

The RDF XML syntax

• XML representation of model– to store/exchange descriptions

• Use of XML Qualified Names and XML Namespaces to represent URIs in RDF/XML

• Conventions for the meaning of structures in RDF/XML document

• Service can “know in advance” the meaning of structures in RDF/XML document

– i.e. always represents RDF graphs– even if unanticipated vocabularies used– can read multiple descriptions into store and

“merge” on URIs

Page 96: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

96

A simple DC metadata record (RDF/XML)

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/"> <rdf:Description rdf:about=“http://example.org/doc/1”> <dc:creator>a</dc:creator> <dc:contributor>b</dc:contributor> <dc:publisher>c</dc:publisher> <dc:subject>d</dc:subject> <dc:description>e</dc:description> <dc:identifier>f</dc:identifier> <dc:relation>g</dc:relation> <dc:source>h</dc:source> <dc:rights>i</dc:rights> <dc:format>j</dc:format> <dc:type>k</dc:type> <dc:title>l</dc:title> <dc:date>m</dc:date> <dc:coverage>n</dc:coverage> <dc:language>o</dc:language> </rdf:Description> </rdf:RDF>

Page 97: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

97

RDF Vocabulary Description Language (RDF Schema)

• Provides mechanisms to describe– terms used in RDF statements– relationships between terms– e.g. Dublin Core metadata element set described

using RDF(S)

• Defines type system– resources grouped into classes– classes may be related hierarchically (subClassOf)– properties may be related hierarchically

(subPropertyOf)– use of properties may be constrained (domain,

range)

• More RDF statements– i.e. metadata about metadata elements

Page 98: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

98

Description of Dublin Core Creator

http://purl.org/dc/elements/1.1/creator

rdfs:label

Creator

rdfs:commentAn entity …

dc:description

Examples of a …rdf:type

http://www.w3.org/1999/02/22-rdf-syntax-ns#Property

Page 99: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

99

Description of Dublin Core Creator (RDF/XML)

<rdf:Property rdf:about="http://purl.org/dc/elements/1.1/creator">

<rdfs:label xml:lang="en-US">Creator</rdfs:label>

<rdfs:comment xml:lang="en-US">An entity primarily responsible for making the content of the resource.</rdfs:comment>

<dc:description xml:lang="en-US">Examples of a Creator include a person, an organisation, or a service. Typically, the name of a Creator should be used to indicate the entity.</dc:description>

<rdfs:isDefinedBy rdf:resource="http://purl.org/dc/elements/1.1/"/>

<dcterms:issued>1999-07-02</dcterms:issued>

<dc:type rdf:resource="http://dublincore.org/usage/documents/principles/#element"/>

</rdf:Property>

Page 100: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

100

Simplicity, contradiction, trust

• In RDF, meaning is expressed by simple statements:

– Subject-Predicate-Object

• Anyone on Web can assert (in RDF sense) anything about anything

– software agents navigating Web of statements – may be able to process some of these statements

but not all– ignore the statements you don't understand– tolerance of inconsistency and errors

• Establishing trust as fundamental part of Semantic Web infrastructure

– Who said this (and when etc)

Page 101: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

101

Metadata and the Semantic Web

• Argued that the Semantic Web principles fit the nature of metadata

– Metadata supports many different functions– Metadata is inherently "modular"

– Metadata creation is not a one-off act, but an ongoing, distributed process

– the metadata creator can't predict how users may want to use resources and query metadata

– new uses of resources result in new metadata

– Metadata is not (or at least not only) "objective", "authoritative" information

– Some attributes represent interpretations– Some attributes are context-dependent– Multiple (even conflicting) descriptions can co-exist

Page 102: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Some RDF applications

Page 103: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

103

RDF Site Summary (RSS) 1.0

• Simple RDF metadata vocabulary designed to support syndication of "news" items

• An RSS "channel" is published as an RDF/XML docment

• Provides metadata about– The channel itself

– A summary of its scope and purpose

– A sequence of items– Summary descriptions of Web documents

• Content of channel regularly updated by provider

• Wide, simple, automated distribution

http://purl.org/rss/1.0/ http://purl.org/rss/1.0/

Page 104: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

104

RDF Site Summary (RSS) 1.0

• Typical applications– Web sites: render content of specific channels as

part of their own Web sites– On line aggregator services: harvest numerous

channels and provide search/filtering services across the items

– e.g. Meerkat

– Desktop news readers: allow users to "subscribe" to list of channels, regularly download content for user to browse

– e.g. Amphetadesk

• RSS also generated from some Weblog management systems

– SWAD(E) activity on "semantic weblogging"

Page 105: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

http://www.ukoln.ac.uk/ http://www.ukoln.ac.uk/

Page 106: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

106

Metadata schema registries

• How to encourage convergence and reuse of metadata vocabularies

• Implementers – may be unaware of existing vocabularies– adapt/customise "standard" terms for application-

specific use– may combine terms from multiple "standard"

sources – coin application-specific terms or extensions

• Application profile– A metadata element set optimised for a particular

application

Page 107: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

107

Metadata schema registries

• A publication context for– "standard" metadata vocabularies and their terms– (depending on scope of registry) also implementer

usages/adaptations of those vocabularies and their terms

– To provide a "dictionary" function– To highlight relationships, encourage

reuse/convergence

• Based on indexing RDF data distributed on Web?

• Requires shared conventions for describing– metadata vocabularies – and their usages and adaptations

Page 108: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

http://dublincore.org/dcregistry/http://dublincore.org/dcregistry/

Page 109: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

109

Summary

• RDF provides a common framework for making machine-processable statements about resources

• The “Semantic Web” provides a vision of metadata as

– modular, extensible– distributed, devolved– dynamic, evolving

• Seeks to address (some of) the challenges of cross-domain, cross-community interoperability

• Fundamental role of trust on the Semantic Web

Page 110: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

110

Overall summary

• Global networks have created a new context for the delivery of services

• Metadata fundamental to service provision• Services being built (successfully!)

– OAI PMH as a low-barrier technology

• No one-size-fits-all solution• Debates, tensions, balances….

– automated processes v human labour – domain-specific richness v cross-domain (over-?)

simplicity– standards v their implementation– objectivity v subjectivity– centralisation v distribution

• Emergence of a Semantic Web?

Page 111: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

111

Acknowledgements

Parts of the content of this presentation are adapted from earlier presentations by:

Tom Baker (Fraunhofer-Gesellschaft, Berlin),

Michael Day, Rachel Heery, Paul Miller, and Andy Powell (UKOLN)

Page 112: An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

112

Acknowledgements

UKOLN is funded by Resource: the Council for Museums, Archives and Libraries, the Joint Information Systems Committee (JISC) of the UK higher and further education funding councils, as well as by project funding from the JISC and the European Union. UKOLN also receives support from the University of Bath where it is based.

http://www.ukoln.ac.uk/