Upload
erin-ortega
View
215
Download
1
Tags:
Embed Size (px)
Citation preview
An introduction to metadatafor libraries, museums and archives
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
Pete Johnston
UKOLN, University of Bath
Bath, BA2 7AY
UKOLN is supported by:
[email protected]://www.ukoln.ac.uk/
Section 1 : An Introduction to Metadata
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
3
An introduction to Metadata
• Memory institutions, network services and metadata
• What is metadata?• Exposing/sharing metadata• Exposing/sharing metadata :
semantics– the Dublin Core Metadata Initiative
Memory institutions, network services and
metadata
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
5
Memory institutions
Museums, libraries and archives—often called memory institutions—are trusted organizations that collectively document the entire range of human experience and expression.
Memory institutions are engaged in the important work of:
• Capturing, authenticating, and making sense of cultural memory;
• Preserving the human record for future generations; and
• Sharing knowledge to support education and learning.
http://www.ukoln.ac.uk/interop-focus/ccs/positions/ http://www.ukoln.ac.uk/interop-focus/ccs/positions/
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
6
Delivering services
• Memory institutions provide services to users– (At least some of) these services provide access to
resources
• Emergence of built on global networks– remote access to digital resources for all
(potentially…)– resources available “round the clock” – resources comparable to other digital resources
from elsewhere
• Investment in – digitisation of cultural content– network services providing access to digitised
content
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
7
Delivering services
• Potential for new types of service– “digital libraries”, “virtual museums” etc– integrated access to resources from multiple remote
content providers – services defined by theme/subject/activity/audience
etc, not by location/source – “packaging” and re-purposing of content– user-oriented rather than provider-oriented
• Changing user expectations– user wants information relevant to task/activity
– may see structural/organisational boundaries of content providers as unimportant!
– user wants access from any location– user wants access at any time
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
8
Delivering services
• Move from web sites to “portals”– “A network service that provides a personalised,
single point of access to a range of heterogeneous network services, local and remote, structured and unstructured”
– Andy Powell, 2002
• Content providers exposing content for delivery through multiple services, channels
• Presentation services “surfacing” content from multiple (distributed) sources
• Memory institutions may perform both roles• Move away from “silo mentality” towards
more “joined-up” approaches
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
9
Resource discovery on the Web
• Broadly two approaches to providing discovery services
– software indexing of resource content– human description of resources
• Web search engines– software agents (robots) retrieve documents by
following hyperlinks (crawling)– index text of documents– make index available as searchable database– some clever ranking algorithms
– e.g. Google infers “Page Ranking” based on links to document
– “find pages which link to page X”– “find pages similar to X”
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
10
Resource discovery on the Web
• Web search engines– tend to generate many results
– and may suffer from “spamming” – ranking algorithms may help
– don’t support “structured search”– search on author name– search on document type (“journal article”)
– limited to textual resources– generally, poor support for search for multimedia
objects
• “The hidden Web”– robots may not crawl documents dynamically
generated from databases/CMS
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
11
Resource discovery on the Web
• But automated indexing – is low cost
– At least compared to human resource description
– (usually) scales to large numbers of resources
– can be a useful tool!
• Challenge of finding appropriate balance of approaches for context
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
12
Metadata for services
• Metadata has been important to “traditional” service provision…
• … is essential component of effective network services
What is metadata?
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
14
What is metadata?
• Simple definitions…• ‘Structured data about data’.
– Dublin Core Metadata Initiative FAQ, 2003
• Machine-understandable information about Web resources or other things.
– Tim Berners-Lee, W3C, 1997
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
15
Towards a “functional” view of metadata
• Data associated with objects which relieves their potential users of having to have full advance knowledge of their existence or characteristics. A user might be a program or a person.
– Lorcan Dempsey & Rachel Heery, 1998
• Structured data about resources that can be used to help support a wide range of operations
– Michael Day, 2001
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
16
What resources, objects, things?
• HTML documents• digital images• databases• books• museum objects• archival records• metadata records
• Web sites• collections• services• physical places• people• institutions• abstract “works”• concepts• events
• Metadata might exist for almost anything– digital, physical, “abstract” resources
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
17
What resources, objects, things?
• Metadata records include– bibliographic records in library catalogues or from
abstracting & indexing services– descriptions of archival material in archival finding
aids – object records in museum documentation /
collection management systems– entries in directories of organisations, individuals
and services– descriptions of digital objects (documents, images,
software)– descriptions of collections of digital objects– descriptions of network services– descriptions of metadata records
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
18
What operations?
• Operations by human users, software tools • Metadata might be used to support many
different functions– resource disclosure & discovery– resource management, including preservation– intellectual property rights management– commerce– authentication and authorisation– personalisation and localisation of services
• Different functions require different types/classes of metadata
– No “one size fits all solution”– Need to specify “functional requirements”
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
19
Metadata elements & element sets
• Metadata describes attributes or properties of a resource
• Each attribute or property is described by a metadata element
– Can be identified, formally documented/defined– May be represented in different forms
• A metadata element set– coherent bounded set of elements formulated as
basis for metadata creation– created for purpose, as a unit
• Schema– structured representation of an element set
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
20
Metadata for resource discovery
• User wishes to1. discover resources according to some criteria2. (optionally) identify a specific resource
– confirm that resource described is resource sought– distinguish similar resources
3. select– evaluate, choose resource appropriate to needs
4. locate resource5. obtain/access resource6. use resource
– open, read, display, run, play, copy, unpackage/repackage
– interpret content
• Resource discovery metadata supporting (primarily) operations 1 - 4
Metadata for resource discovery
full-text indexes might not be classed as “metadata” by some!
generated by software tools
discovery (by content), location
semantically simple forms(e.g. Dublin Core)
typically covering description of broad range of resources
maybe part generated automatically, partly human authored
discovery, identification, selection, location
richer complex forms(e.g. MARC, EAD,CIMI-SPECTRUM, AMICO etc)
typically covering specific types of resources
often associated with particular community/domain
creation may involve relatively high degree of human expertise
discovery, identification, selection, location, access, use (which may be type specific)
Continuum of complexity/functionality
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
22
Association of resource and metadata (1)
Resource1
e.g. meta elements in HTML docs; summary properties in word processor docs
Can resource support embedding of metadata?
Does metadata creator have write access to resource?
Can service extract embedded metadata?
Metadata about aggregates of resources?
Metadata about people, places, concepts?
Creator = J Smith
Date = 2001-11-05
Title = Report
Metadata embedded in resource
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
23
e.g. link elements in HTML docs
Metadata record may be remote from resource
Can resource support embedding of link?
Does metadata creator have write access to resource?
Can service follow link to metadata record?
What happens when resource deleted?
Metadata about aggregates of resources?
Metadata about people, places, concepts?
Resource1
Metadata rec 1
Metadata rec = 1
Creator = J Smith
Date = 2001-11-05
Title = Report
Metadata record as separate objectRecord identifier embedded in resource
Association of resource and metadata (2)
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
24
Metadata record may be remote from resource
Does not require embedding of metadata or link
Does not require metadata creator to have write access to resource
Metadata record created independently of resource – possibly multiple records
Service uses metadata records independently of resource
Metadata record may persist after resource deleted
Metadata record can describe anything (with identifier…)Resource1
Metadata rec 1
Creator = J Smith
Date = 2001-11-05
Title = Report
Doc = 1 Metadata record as separate objectResource identifier in metadata record
Association of resource and metadata (3)
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
25
J Smith 2001-11-05 Report
Creator Date TitleDoc
1
Metadata record is used separately from resource described
Recognition that metadata is resource to be managed, separately from resource described
Metadata content stored in “database”, exposed in form(s) appropriate for service(s)
Metadata as managed resource
Exposing/sharing metadata
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
27
How is metadata exposed/shared?
• Resource description “communities”– characterised by consensus on conventions for
internal exchange of metadata
• Metadata for resource discovery – is used beyond its creator community– is combined/compared with metadata from other
communities– is aggregated or cross-searched by services
• How does a content provider make metadata records available in a commonly understood form?
• How does a service provider obtain these metadata records from data providers?
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
28
How is metadata exposed/shared?
• Effective sharing of information expressed in metadata record requires agreement on
– metadata semantics– what metadata elements mean
– metadata structure– data model, relationships of component parts
– metadata syntax– rules of expression
– protocols– how metadata records transmitted between
content provider and service provider
• Agreements formalised as specifications and standards (ideally…)
Exposing/sharing metadata :semanticsIntroducing the Dublin Core
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
30
Introducing the Dublin Core
• Initiative to improve resource discovery on Web
– not for complex resource description– based on description of simple “document-
like objects”– extended to other classes of resource
• International, cross-disciplinary consensus on simple element set
– 15 elements– all optional– all repeatable
http://dublincore.org/ http://dublincore.org/
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
31
Introducing the Dublin Core (2)
• Title• Subject• Description• Creator• Publisher• Contributor• Date
• Type• Format• Identifier• Source• Language• Relation• Coverage• Rights
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
32
Dublin Core: creator
• Term Name: creator• Label: Creator• Definition: An entity primarily responsible for making
the content of the resource.• Comment: Examples of a Creator include a person, an
organisation, or a service. Typically, the name of a Creator should be used to indicate the entity.
• Type of Term: element• Status: recommended• Date issued: 1999-07-02• URI: http://purl.org/dc/elements/1.1/creator
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
33
Dublin Core: date
• Term Name: date• Label: Date• Definition: A date associated with an event in the life
cycle of the resource.• Comment: Typically, Date will be associated with the
creation or availability of the resource. Recommended best practice for encoding the date value is defined in a profile of ISO 8601 [W3CDTF] and follows the YYYY-MM-DD format.
• Type of Term: element• Status: recommended• Date issued: 1999-07-02• URI: http://purl.org/dc/elements/1.1/date
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
34
Standardisation of Dublin Core
CEN Workshop Agreement (EU) • 2000: Dublin Core elements endorsed as
CWA13874 • Usage guidelines for European industry
NISO Z39.85 (USA)• 2001: National Information Standards
Organization, an ANSI affiliate
ISO• 2002: Dublin Core Metadata Element Set
approved as ISO 15836
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
35
Using the Dublin Core
• Tom Baker, “ A Grammar of Dublin Core”, Dlib, October 2000
• Metaphor of metadata as language• DC as a simple “pidgin” language for use by
“tourists on the Internet commons”• Small vocabulary, simple grammar/structure
– This Resource has Title “An introduction to metadata”
– This Resource has Subject “Resource discovery”
• Not subtly expressive, but easy to learn and deploy - “good enough” to work
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
36
Using the Dublin Core
• Designed for simplicity of semantics, ease of use
• Provides basic semantic interoperability
– semantics sufficiently general to be useful across domains
• Can provide 15 “windows” into richer resource descriptions
– disclose rich description in simple form– semantic cross-walks, mappings
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
37
Using the Dublin Core
title
creator
date
desc
rights
Rich description
Simple DC description
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
38
Qualifying Dublin Core
• Allows for controlled extensibility through “qualifiers”
– Element refinements– make element meanings narrower, more specific:
– a Date Created versus Date Modified
– an IsReplacedBy versus Replaces Relation
– Encoding schemes– provide contextual information or parsing
rules that aid in the interpretation of a value– may specify that a value is drawn from a
controlled vocabulary (e.g. LCSH, TGN etc)– may specify that a value is formatted in
accordance with a specified notation (e.g. date formats)
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
39
Qualifying Dublin Core
• Qualifiers make elements more specific– Element Refinments narrow meanings, never
extend– Encoding Schemes give context to element values
• The “dumb-down” rule– Application should be able to use the value as if it
were unqualified– Ignore unknown Encoding Schemes– Resolve (semantically more specific) Element
Refinements to (more generic) Elements
• Some loss of specificity, but still generally correct and useful for discovery
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
40
Dublin Core: valid
• Term Name: valid• Label: Valid• Definition: Date (often a range) of validity of a
resource.• Type of Term: element-refinement• Status: recommended• Date issued: 2000-07-11• URI: http://purl.org/dc/terms/valid
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
41
Using the Dublin Core
• Not a replacement for richer descriptive standards
• But useful– If you wish disclose community-specific
metadata to other communities using commonly understood semantics
– If you wish to provide integrated access to your own metadata databases with different underlying semantics
– If you only need simple metadata semantics
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
42
Using the Dublin Core
• Inherent tensions in DC– Broad, fuzzy “search buckets” or rigidly prescribed
usage?– Generic applicability across domains or intra-
domain precision?– One-size-fits-all or customise-as-you-please?– Simply discovering resources (a few typical search
attributes) or describing them fully (lots of detail)?– Dublin Core primarily as a native record format or
extracted from richer metadata?– Broad-brush minimalism or comprehensive
structuralism?
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
43
Summary
• Emergence of global networks enable new approaches to providing access to resources
– Increasing requirement to provide resource discovery across boundaries
• Metadata supports many functions, including resource discovery
• DC as simple, cross-disciplinary metadata element set
• Next:– How metadata records are represented:
syntax/structure– How metadata records are exposed/shared/used
in resource discovery services
Section 2 : Sharing metadata: XML and the OAI Protocol for Metadata
Harvesting
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
45
Sharing metadata : XML and OAI
• Exposing/sharing metadata: syntax and structure
– Extensible Markup Language (XML)– XML Schema
• Metadata harvesting– The Open Archives Initiative Protocol for
Metadata Harvesting
• Some OAI-based services• Developing metadata-based services
Exposing/sharing metadata : syntax and structureXML & XML Schema
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
47
Embedding DC metadata in (X)HTML
• Dublin Core metadata can be embedded into (X)HTML documents
– Simple to deploy but may be difficult to manage, maintain
• But almost none of the Web search engine services index it
• Lack of trust in “open” Web context– Abuse by content providers seeking to improve the
ranking of their documents
• However, may be useful technique in “closed” context
– e.g. single Web site or where control over which documents indexed
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
48
Embedding DC metadata in (X)HTML
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<link rel="schema.DC" href="http://purl.org/dc/elements/1.1/" />
<meta name="DC.Title" lang="en" content="Expressing Qualified Dublin Core in HTML/XHTML meta elements" />
<meta name="DC.Creator" content="Andy Powell, UKOLN, University of Bath" />
<meta name="DC.Date.Issued" scheme="W3CDTF" content="2002-09-09" />
<meta name="DC.Identifier" scheme="URI" content="http://dublincore.org/documents/dcq-html/" />
<meta name="DC.Format" scheme="IMT" content="text/html" />
<meta name="DC.Type" scheme="DCMIType" content="Text" />
</head>
<body>
</body>
</html>
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
49
Introducing XML
• Extensible Markup Language– Recommendation of W3C, 1998, 2000
• Defines means of describing tree-structured data in text-based format
– embedded markup delimits and describes data
• Simple, platform-independent syntax• Standard programming interfaces
– reusable software components
• Support from major software vendors• Widely adopted for transferring data between
programs, systems
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
50
<table>
<record>
<doc>1</doc>
<creator>J Smith</text>
<date>2001-11-05</date>
<title>Report</title>
</record>
</table>
J Smith 2001-11-05 Report
Creator Date TitleDoc
1 record
title
Report
creator
J Smith
date
2001-11-05
table
record
doc
1
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
51
Creator Date TitleDoc
<record>
...
</record>
<record>
...
</record>
Serialisation
Transmission
De-serialisation
Remote application
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
52
XML and interoperability
• “Meta-language”– language for describing markup languages– can define unlimited number of markup languages
• But….– XML says nothing about what your names mean– will a software agent process my <doc> XML
element correctly?
• Interoperability requires consensus on– the names of components (XML elements and
attributes)– the structural model of a class of document:– the semantics represented by the components and
the structure
• Shared use of common XML “schemas”
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
53
XML schemas
• Means to codify syntax/structure rules for class of XML document
– what markup is allowed– structural constraints on use of markup
• Document Type Definition (DTD)– part of XML Recommendation
• W3C XML Schema– W3C recommendation– data-typing i.e. tighter control on element content– support for XML Namespaces– uses XML syntax
• Software can validate instance against DTD/schema
Metadata harvesting:The Open Archives Initiative Protocol for Metadata Harvesting
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
55
Searching & harvesting
• Resource discovery services operating across the resources of multiple distributed content providers
• Possible strategies– Distributed search
– submit parallel queries to multiple metadata databases
– collate multiple result sets for presentation to user
– Harvest– gather metadata records from multiple providers into
single database– (periodic re-gathering to refresh data)– query central database
• Performance issues in cross-searching
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
56
Introducing OAI
• Open Archives Initiative– develops/promotes interoperability standards to
facilitate dissemination of content– roots in “e-prints” community seeking to improve
access to scholarly publications– Deposit pre-prints – for quicker dissemination
– Deposit post-prints – to reduce institutional costs, maximise impact
– e-print “archives”– institutional
– federated subject/discipline-based
– required simple low-cost interface to expose metadata for reuse
http://www.openarchives.org/ http://www.openarchives.org/
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
57
Introducing OAI (2)
• Terminology– “Archive” = repository, not archive– “Open” in terms of architecture, not free/unlimited
access to repository
• Protocol for Metadata Harvesting (OAI-PMH)– Developed by international technical committee,
1999-2002– Shift from “optimising discovery of e-prints” to more
generic resource discovery– OAI “committed to version 2.0 as a production
release”
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
58
Introducing OAI PMH
• Lightweight, low-cost protocol which allows data providers to expose metadata records for retrieval by service providers
• Service providers can say “give me all/some of your metadata records”
• Built on HTTP, XML– Six verbs: requests from service provider to data
provider sent using HTTP GET/POST– responses from data provider to service provider
as XML documents
• Not a distributed search protocol• Not limited to e-print archives
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
59
Introducing OAI PMH (2)
• Supports transfer of metadata records– resources made available separately– identifier/locator of resources typically included in
metadata record
• Data provider must provide simple/unqualified DC metadata record
– may provide metadata records in other “formats”– metadata formats must be associated with a W3C
XML Schema
• Extensible framework for metadata about– repository, sets, records
• Metadata and resources often freely available– but not a requirement
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
60
Introducing OAI PMH (3)
• Supports selective harvesting– by sets– by datestamps
• Example– Service Provider: List all records added since Jan
1 2002 in simple DC format (oai_dc)– verb = ListRecords– from = 2002-01-01– metadataPrefix = oai_dc– http://www.myarchive.org/cgi-bin/oai?verb=ListRecords&from=2002-01-01&metadataPrefix=oai_dc
– Data Provider: Returns XML document containing records
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
61
Resources
Metadata
Website
Resources
Metadata
Website
DC PortalWebsite
PortalWebsite
PortalWebsite
DC
OAI-PMH
OAI-PMH
OAI-PMH
OAI-PMH
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
62
OAI DC metadata record (from Library of Congress Repository 1)
<oai_dc:dc>
<dc:title>Empire State Building. [View from], to Central Park</dc:title>
<dc:creator>Gottscho, Samuel H. 1875-1971, photographer.</dc:creator>
<dc:date>1932 Jan. 19</dc:date>
<dc:type>image</dc:type>
<dc:type>two-dimensional nonprojectible graphic</dc:type>
<dc:type>Cityscape photographs.</dc:type>
<dc:type>Acetate negatives.</dc:type>
<dc:identifier>http://hdl.loc.gov/loc.pnp/gsc.5a18067</dc:identifier>
<dc:coverage>United States--New York (State)--New York.</dc:coverage>
<dc:rights>No known restrictions on publication.</dc:rights>
</oai_dc:dc>
Some OAI based services
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
64
Resource Discovery Network (RDN)
• Co-operative network of “subject gateways”– Funded by JISC for HE and FE
• Seven “hubs”– ALTIS - Hospitality, Leisure, Sport and Tourism– BIOME: Health and Life Sciences– EEVL: Engineering, Mathematics and Computing– GESource: Geography and Environment– Humbul: Humanities– PSIgate: Physical Sciences– SOSIG: Social Sciences, Business and Law
• Databases of metadata records describing Internet resources selected for high quality
http://www.rdn.ac.uk/ http://www.rdn.ac.uk/
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
65
Resource Discovery Network (RDN)
• Hubs as subject communities– metadata creators are subject specialists– good links with users– separate metadata schemas
• Hubs provide their own Web interfaces– search databases– other services: tutorials, guides, alerting etc
• But operate within a shared policy framework– collection development– cataloguing guidelines– technical standards– agreements on IPR
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
66
Resource Discovery Network (RDN)
• RDN Resource Finder – Cross-search of Hubs’ metadata records– Initially distributed search using Z39.50
– Performance issues– Difficult to build flexible browse interface
– Now using OAI PMH to harvest records– Currently harvesting simple DC– Basic keyword searching– Exploring harvesting some richer record formats for
additional functionality
• Also some sharing of metadata– between Hubs (DC plus extensions)– between Hubs and other similar services (LOM)– but Hubs’ metadata not freely available for harvest
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
67
Resource Discovery Network http://www.rdn.ac.uk/
Resource Discovery Network http://www.rdn.ac.uk/
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
68
e-Prints UK
• JISC-funded project, 2002-2004• Provide access to e-prints via subject-based
RDN services• Harvest metadata from e-print archives
– institutional, non-institutional, personal
• Automatically enhance harvested metadata (using Web Services)
– Add (or validate) authoritative forms of author names (OCLC)
– Assign subject classification (based on analysis of full-text of resource) (OCLC)
– Generate OpenURLs from citations (based on analysis of full-text of resource) (Univ of Southampton/UKOLN)
http://www.rdn.ac.uk/projects/eprints-uk/ http://www.rdn.ac.uk/projects/eprints-uk/
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
69
e-Prints UK
• Provide search services– across all metadata– subject-partitioned search services for Hubs
• Enhanced metadata records made available to originating e-print archive
• Note– service provider enhancing harvested metadata to
provide more functionality– some of enhancement process requires access to
resource as well as metadata record– two-way flow of metadata records– recommendations for how to use simple DC to
describe e-prints to maximise benefits of metadata disclosure
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
70
e-Prints UK
e-Prints UK
RDNgateway/portal
service
RDNgateway/portal
service
RDNgateway/portal
service
Subjectclassification
service
Nameauthorityservice
Citationanalysisservice
Institutionale-printarchives
Personale-printarchives
OAI-PMH
SOAP
Non-institutionale-printarchives
SOAPJavascript/HTTPZ39.50
Web servicesofferedby OCLC
Web serviceofferedby Southampton
e-print archives
end-user services thru the RDN
Developing metadata-based services
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
72
Developing services
• Consensus on metadata semantics/syntax, transport protocols etc as minimal requirements
• Resource selection– collections policies
• Metadata quality assurance– “cataloguing rules”
– mandatory elements, minimum-level records– guidance on content of values of elements: formats,
controlled vocabularies, identifiers etc
– Maintenance, currency of metadata
• Agreements on IPR, usage rights, “branding”– for metadata records as well as resources
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
73
Developing services
• DCMES intended to be simple enough for creation by untrained creators
– assumption that metadata creation straightforward?
• Recognition that precision in services depends on quality of metadata
• Subject terms/classification difficult for non-expert
• Different services providing different functionality to different audiences may require different metadata
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
74
Developing services
• Human creation of metadata is not cheap! • Where possible, use automated methods to
– Generate metadata– Normalise/enhance metadata
• Service providers as well as data providers can contribute (e.g. e-prints UK)
• Reuse/repurpose metadata• Where human creation required, provide
support– Education, guidelines– Appropriate software tools
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
75
Developing services
• Service developers use/implement metadata standards in pragmatic way
• Standards creators concerned with– Consensus, commonality, interoperability– e.g. DCMES
• Implementers concerned with– Functionality, specificity, localisation– e.g. “Using simple DC to describe e-Prints”
• “Application profile”– A metadata element set optimised for a particular
application
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
76
Summary
• Standards for metadata semantics• XML as syntax for metadata exchange, but
requires consensus on structures• Harvesting model as alternative to distributed
search– OAI PMH
• Service provision– metadata quality– rights issues – application profiles
• Next:– A common framework for metadata?– Towards the “Semantic Web”?
Section 3 : Sharing metadata: RDF and
the Semantic Web
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
78
Sharing metadata: RDF & the Semantic Web
• Is there a problem?• The vision of the “Semantic Web”• Introducing RDF• Some RDF applications
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
79
The problem with XML?
• XML as a mechanism for expressing tree-structured data
• Different communities make different design choices for the meaning of their trees
– All “good” (and valid v XML DTD/Schema)
• Within resource description community, meaning(s) of structure(s) may be limited
• But applications working across communities have to work with multiple XML trees
– potentially unlimited – not scalable in an “open” Web environment?– how to manage ever increasing set of conventions– always encountering new structures/schemas
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
80
The “Semantic Web”
• Activity of World Wide Web Consortium (W3C)
• To make data available on the Web in a form which is easier for machines to to process
– Machine-processable statements about all kinds of things (Web pages, organisations, people, concepts, products, etc) and the relationships/links between them
• To share data between programs and systems designed independently
– Unlock the data held in databases– Link data from different sources– To enable richer more flexible services
http://www.w3.org/2001/sw/ http://www.w3.org/2001/sw/
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
81
The “Semantic Web”
• Builds on – use of Uniform Resource Identifiers
(URIs) to uniquely identify resources– the Resource Description Framework
(RDF) as a common model for expressing information about resources
– an XML syntax for representing RDF data– existing Web protocols (HTTP) for
transferring data
Introducing RDF
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
83
Introducing RDF
• Resource Description Framework– Model & Syntax, W3C Recommendation, 1999– RDF Core WG activity, 2001-2003
• Set of revised/expanded specifications currently (April 2002) in “last call”
– Semantics: formal model– Concepts: abstract syntax (graph)– RDF/XML syntax: conventions for encoding
statements using XML– Test Cases– Vocabulary Description Language– Primer: introduction
http://www.w3.org/RDF/ http://www.w3.org/RDF/
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
84
Introducing RDF (2)
• Provides generic framework for representing information about resources
– set of conventions/infrastructure for applications exchanging metadata
– allows semantics to be defined by different resource description communities
– accommodates mixing of information from diverse sources
• Resource : any object identified by URI– not necessarily accessible via Web
• Property : “attribute” to describe resource– properties also uniquely identified by URI
• Statement : “triple” of specific resource, property, and value
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
85
The RDF model
http://example.org/doc/1author
John
A resource has some property whose value is either (i) a simple string value (literal)…
• The resource identified by the URI http://example.org/doc/1 has a property “author” whose value is “John”
• Or, “John” is the “author” of the resource identified by http://example.org/doc/1
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
86
The RDF model (2)
… or (ii) another resource...
http://example.org/doc/1author
John [email protected]
name email
• The value of property “author” is another resource which has a property “name” with value “John” and a property “email” with value “[email protected]”
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
87
The RDF model (3)
… which may itself have a URI
http://example.org/doc/1
author
John
http://example.org/person/john
name email
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
88
The RDF model (4)
Properties themselves are identified by URIs
http://example.org/doc/1
http://example.org/author
John
http://example.org/person/john
http://example.org/name http://example.org/email
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
89
The power of the RDF model
• Extensible model– supports any vocabularies
• Supports arbitrary complexity of description• URIs as unique “fixed points” to identify
– resources– properties
• Descriptions created independently can be “merged” using URIs as “anchors”
– i.e. supports distributed metadata
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
90
First source
http://example.org/doc/1
author
John
http://example.org/person/john
name email
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
91
Second source
http://example.org/doc/1subject
XML
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
92
Third source
http://example.org/person/john
organisation
JS Foundation
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
93
http://example.org/person/john
organisation
JS Foundation
http://example.org/doc/1
author
John
http://example.org/person/john
name email
http://example.org/doc/1
subject
XML
Three descriptions merged
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
94
A simple DC metadata record (the “hedgehog”)
http://example.org/doc/1
dc:subject
dc:type
dc:title
dc:creatordc:contributor
dc:coverage
dc:rights
dc:relation
dc:format
dc:identifier
dc:datedc:description
dc:source
dc:language dc:publisher
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
95
The RDF XML syntax
• XML representation of model– to store/exchange descriptions
• Use of XML Qualified Names and XML Namespaces to represent URIs in RDF/XML
• Conventions for the meaning of structures in RDF/XML document
• Service can “know in advance” the meaning of structures in RDF/XML document
– i.e. always represents RDF graphs– even if unanticipated vocabularies used– can read multiple descriptions into store and
“merge” on URIs
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
96
A simple DC metadata record (RDF/XML)
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/"> <rdf:Description rdf:about=“http://example.org/doc/1”> <dc:creator>a</dc:creator> <dc:contributor>b</dc:contributor> <dc:publisher>c</dc:publisher> <dc:subject>d</dc:subject> <dc:description>e</dc:description> <dc:identifier>f</dc:identifier> <dc:relation>g</dc:relation> <dc:source>h</dc:source> <dc:rights>i</dc:rights> <dc:format>j</dc:format> <dc:type>k</dc:type> <dc:title>l</dc:title> <dc:date>m</dc:date> <dc:coverage>n</dc:coverage> <dc:language>o</dc:language> </rdf:Description> </rdf:RDF>
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
97
RDF Vocabulary Description Language (RDF Schema)
• Provides mechanisms to describe– terms used in RDF statements– relationships between terms– e.g. Dublin Core metadata element set described
using RDF(S)
• Defines type system– resources grouped into classes– classes may be related hierarchically (subClassOf)– properties may be related hierarchically
(subPropertyOf)– use of properties may be constrained (domain,
range)
• More RDF statements– i.e. metadata about metadata elements
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
98
Description of Dublin Core Creator
http://purl.org/dc/elements/1.1/creator
rdfs:label
Creator
rdfs:commentAn entity …
dc:description
Examples of a …rdf:type
http://www.w3.org/1999/02/22-rdf-syntax-ns#Property
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
99
Description of Dublin Core Creator (RDF/XML)
<rdf:Property rdf:about="http://purl.org/dc/elements/1.1/creator">
<rdfs:label xml:lang="en-US">Creator</rdfs:label>
<rdfs:comment xml:lang="en-US">An entity primarily responsible for making the content of the resource.</rdfs:comment>
<dc:description xml:lang="en-US">Examples of a Creator include a person, an organisation, or a service. Typically, the name of a Creator should be used to indicate the entity.</dc:description>
<rdfs:isDefinedBy rdf:resource="http://purl.org/dc/elements/1.1/"/>
<dcterms:issued>1999-07-02</dcterms:issued>
<dc:type rdf:resource="http://dublincore.org/usage/documents/principles/#element"/>
</rdf:Property>
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
100
Simplicity, contradiction, trust
• In RDF, meaning is expressed by simple statements:
– Subject-Predicate-Object
• Anyone on Web can assert (in RDF sense) anything about anything
– software agents navigating Web of statements – may be able to process some of these statements
but not all– ignore the statements you don't understand– tolerance of inconsistency and errors
• Establishing trust as fundamental part of Semantic Web infrastructure
– Who said this (and when etc)
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
101
Metadata and the Semantic Web
• Argued that the Semantic Web principles fit the nature of metadata
– Metadata supports many different functions– Metadata is inherently "modular"
– Metadata creation is not a one-off act, but an ongoing, distributed process
– the metadata creator can't predict how users may want to use resources and query metadata
– new uses of resources result in new metadata
– Metadata is not (or at least not only) "objective", "authoritative" information
– Some attributes represent interpretations– Some attributes are context-dependent– Multiple (even conflicting) descriptions can co-exist
Some RDF applications
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
103
RDF Site Summary (RSS) 1.0
• Simple RDF metadata vocabulary designed to support syndication of "news" items
• An RSS "channel" is published as an RDF/XML docment
• Provides metadata about– The channel itself
– A summary of its scope and purpose
– A sequence of items– Summary descriptions of Web documents
• Content of channel regularly updated by provider
• Wide, simple, automated distribution
http://purl.org/rss/1.0/ http://purl.org/rss/1.0/
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
104
RDF Site Summary (RSS) 1.0
• Typical applications– Web sites: render content of specific channels as
part of their own Web sites– On line aggregator services: harvest numerous
channels and provide search/filtering services across the items
– e.g. Meerkat
– Desktop news readers: allow users to "subscribe" to list of channels, regularly download content for user to browse
– e.g. Amphetadesk
• RSS also generated from some Weblog management systems
– SWAD(E) activity on "semantic weblogging"
http://www.ukoln.ac.uk/ http://www.ukoln.ac.uk/
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
106
Metadata schema registries
• How to encourage convergence and reuse of metadata vocabularies
• Implementers – may be unaware of existing vocabularies– adapt/customise "standard" terms for application-
specific use– may combine terms from multiple "standard"
sources – coin application-specific terms or extensions
• Application profile– A metadata element set optimised for a particular
application
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
107
Metadata schema registries
• A publication context for– "standard" metadata vocabularies and their terms– (depending on scope of registry) also implementer
usages/adaptations of those vocabularies and their terms
– To provide a "dictionary" function– To highlight relationships, encourage
reuse/convergence
• Based on indexing RDF data distributed on Web?
• Requires shared conventions for describing– metadata vocabularies – and their usages and adaptations
http://dublincore.org/dcregistry/http://dublincore.org/dcregistry/
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
109
Summary
• RDF provides a common framework for making machine-processable statements about resources
• The “Semantic Web” provides a vision of metadata as
– modular, extensible– distributed, devolved– dynamic, evolving
• Seeks to address (some of) the challenges of cross-domain, cross-community interoperability
• Fundamental role of trust on the Semantic Web
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
110
Overall summary
• Global networks have created a new context for the delivery of services
• Metadata fundamental to service provision• Services being built (successfully!)
– OAI PMH as a low-barrier technology
• No one-size-fits-all solution• Debates, tensions, balances….
– automated processes v human labour – domain-specific richness v cross-domain (over-?)
simplicity– standards v their implementation– objectivity v subjectivity– centralisation v distribution
• Emergence of a Semantic Web?
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
111
Acknowledgements
Parts of the content of this presentation are adapted from earlier presentations by:
Tom Baker (Fraunhofer-Gesellschaft, Berlin),
Michael Day, Rachel Heery, Paul Miller, and Andy Powell (UKOLN)
Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
112
Acknowledgements
UKOLN is funded by Resource: the Council for Museums, Archives and Libraries, the Joint Information Systems Committee (JISC) of the UK higher and further education funding councils, as well as by project funding from the JISC and the European Union. UKOLN also receives support from the University of Bath where it is based.
http://www.ukoln.ac.uk/