Upload
joshua-cantrell
View
11
Download
0
Tags:
Embed Size (px)
DESCRIPTION
XML and “meta-tagging” Technical seminar for Pathfinder LEAs, BECTa, Coventry, 26 February 2002. Email [email protected] URL http://www.ukoln.ac.uk/. Pete Johnston UKOLN, University of Bath Bath, BA2 7AY. UKOLN is supported by:. XML and “meta-tagging”. - PowerPoint PPT Presentation
Citation preview
XML and “meta-tagging” Technical seminar for Pathfinder LEAs,
BECTa, Coventry, 26 February 2002
Pete Johnston
UKOLN, University of Bath
Bath, BA2 7AY
UKOLN is supported by:
[email protected]://www.ukoln.ac.uk/
XML and "meta-tagging", BECTa Pathfinders, Coventry, 26 Feb 2002
2
XML and “meta-tagging”
• What is metadata & what is it used for?
• Sharing metadata– semantics : introducing the Dublin Core– syntax : introducing the Extensible Markup
Language (XML)– structure : the limits of XML
• Introducing the Resource Description Framework (RDF)
XML and "meta-tagging", BECTa Pathfinders, Coventry, 26 Feb 2002
3
What is metadata?
• “Data associated with objects which relieves their potential users of having to have full advance knowledge of their existence or characteristics. A user might be a program or a person.”
– Dempsey and Heery, 1998
• “Machine understandable information about web resources or other things.”
– Berners-Lee, 1997
• Structured data about resources that can be used to help support a wide range of operations
XML and "meta-tagging", BECTa Pathfinders, Coventry, 26 Feb 2002
4
What resources, objects, things?
• HTML documents• digital images• databases• books• museum objects• archival records• metadata records
• Web sites• collections• services• physical places• people• abstract “works”• concepts• events
XML and "meta-tagging", BECTa Pathfinders, Coventry, 26 Feb 2002
5
Who/what is metadata for?
• Used by– human agents (owner, user/researcher, 3rd party
services)– software agents (e.g. aggregators, portals, brokers)
• Different “flavours” of metadata serve different purposes
– simple, generic vs. rich, specific– published widely vs. shared within community vs.
used by resource owner/manager
• Created by– software tools (resource creation tools, indexing
robots/web crawlers)
– human agents (resource creator/owner, other parties)
XML and "meta-tagging", BECTa Pathfinders, Coventry, 26 Feb 2002
6
Metadata embedded in resource
Resource1
e.g. meta elements in HTML docs; summary properties in word processor docs
Can resource support embedding of metadata?
Does metadata creator have write access to resource?
Can service extract embedded metadata?
Metadata about aggregates of resources?
Metadata about people, places, concepts?
Creator = J Smith
Date = 2001-11-05
Title = Report
XML and "meta-tagging", BECTa Pathfinders, Coventry, 26 Feb 2002
7
Metadata linked from resourcee.g. link elements in HTML docs
Metadata record may be remote from resource
Can resource support embedding of link?
Does metadata creator have write access to resource?
Can service follow link to metadata record?
Metadata about aggregates of resources?
Metadata about people, places, concepts?
Resource1
Metadata rec 1
Metadata rec = 1
Creator = J Smith
Date = 2001-11-05
Title = Report
XML and "meta-tagging", BECTa Pathfinders, Coventry, 26 Feb 2002
8
Metadata points to resourcee.g. most metadata records…
Metadata record may be remote from resource
Does not require embedding of metadata or link
Does not require metadata creator to have write access to resource
Service obtains metadata record independently of resource
Metadata record can describe anything (with identifier…)
Metadata record may persist after resource deleted
Resource1
Metadata rec 1
Creator = J Smith
Date = 2001-11-05
Title = Report
Doc = 1
XML and "meta-tagging", BECTa Pathfinders, Coventry, 26 Feb 2002
9
Metadata managed in database
J Smith 2001-11-05 Report
Creator Date TitleDoc
1
Metadata content stored in database, exposed in form(s) appropriate for service(s)
XML and "meta-tagging", BECTa Pathfinders, Coventry, 26 Feb 2002
10
What operations?
Owner / manager / provider
establish control of own resources;administer/manage (through time);disclose/promote own resources widely;enable and control access/use;contextualise
Other metadata creator
disclose/promote resources (including resources owned by others); re-contextualise (re-describe, annotate)
Discovery service
disclose/promote resources from range of providers; re-contextualise (re-describe, annotate);facilitate user discovery
End user find, identify, select resources from range of providers; obtain/use; interpret
XML and "meta-tagging", BECTa Pathfinders, Coventry, 26 Feb 2002
11
Resources
Website
Metadata
Single resource provider Resource owner
=
Metadata creator
=
Service provider
XML and "meta-tagging", BECTa Pathfinders, Coventry, 26 Feb 2002
12
PortalWebsite
Metadata
Website
Metadata
Website
Multiple resource owners/Metadata
creators/Local service providers
Separate portal service provider
Multiple resource providers
XML and "meta-tagging", BECTa Pathfinders, Coventry, 26 Feb 2002
13
MetadataWebsite
PortalWebsite
Metadata
Website
Multiple resource owners/Metadata
creators/Local service providers
Other metadata creators
Separate portal service provider
MetadataWebsite
XML and "meta-tagging", BECTa Pathfinders, Coventry, 26 Feb 2002
14
PortalB
Website
MetadataWebsite
PortalA
Website
Metadata
Website
MetadataWebsite
XML and "meta-tagging", BECTa Pathfinders, Coventry, 26 Feb 2002
15
Metadata for resource discovery
• Metadata for resource discovery – is used beyond its creator community– is combined/compared with metadata from other
communities– is aggregated or cross-searched by services
• Challenges of “interoperability”– How does a metadata provider make metadata
records available in a commonly understood form?– (How does a service provider obtain these
metadata records from data providers?)
XML and "meta-tagging", BECTa Pathfinders, Coventry, 26 Feb 2002
16
How is metadata shared?
• Metadata as language; metadata records as sets of statements
• Effective transmission of information requires agreement on
– semantics– what terms mean– e.g. “cat”, “to sit”, “mat”
– structure– significance of arrangement of terms– e.g. sentence: subject -> verb -> object
(in English….)
– syntax– rules of expression– “The cat sat on the mat.”
XML and "meta-tagging", BECTa Pathfinders, Coventry, 26 Feb 2002
18
Introducing the Dublin Core
• Initiative to improve resource discovery on Web
– not for complex resource description– simple “document-like objects”– extended to other classes of resource
• Interdisciplinary consensus on simple element set
– 15 elements– all optional– all repeatable
XML and "meta-tagging", BECTa Pathfinders, Coventry, 26 Feb 2002
19
Introducing the Dublin Core (2)
• Title• Subject• Description• Creator• Publisher• Contributor• Date
• Type• Format• Identifier• Source• Language• Relation• Coverage• Rights
XML and "meta-tagging", BECTa Pathfinders, Coventry, 26 Feb 2002
20
Introducing the Dublin Core (3)
• Simplicity of semantics, ease of use• Provides basic semantic interoperability
– across domains– across language communities
• Allows for extensibility– but tension between extending DC and choosing
other, richer schema
• Interoperability requires – use of content rules/standards– clarity about resource being described
– e.g. digital surrogate v physical “original”
XML and "meta-tagging", BECTa Pathfinders, Coventry, 26 Feb 2002
21
Using the Dublin Core
• Not a replacement for richer descriptive standards
• A “pidgin” language for use by “tourists on the Internet commons”
– Tom Baker, “A Grammar of Dublin Core”
• Can provide 15 “windows” into richer resource descriptions
– disclose rich description in simple form– semantic cross-walks, mappings– (if you have rich descriptions, then) export
rather than create?
XML and "meta-tagging", BECTa Pathfinders, Coventry, 26 Feb 2002
23
Introducing XML
• Extensible Markup Language– Recommendation of W3C, 1998, 2000
• Defines means of describing tree-structured data in text-based format
– embedded markup delimits and describes data
• “Meta-language”– language for describing markup languages– can define unlimited number of markup languages
• Widely adopted for transferring data between programs, systems
XML and "meta-tagging", BECTa Pathfinders, Coventry, 26 Feb 2002
24
Introducing XML (2)
• Simple syntax• Rules of XML made public so any
programmer can write parser• Many parsers available for application
developer– reusable software components– standard programming interfaces
• Data independent of platform• Support from major software vendors
– use of XML increasingly invisible to user
• Foundation for “Web services”– distributed applications invoked over Web
XML and "meta-tagging", BECTa Pathfinders, Coventry, 26 Feb 2002
25
<table>
<record>
<doc>1</doc>
<creator>J Smith</text>
<date>2001-11-05</date>
<title>Report</title>
</record>
</table>
J Smith 2001-11-05 Report
Creator Date TitleDoc
1 record
title
Report
creator
J Smith
date
2001-11-05
table
record
doc
1
XML and "meta-tagging", BECTa Pathfinders, Coventry, 26 Feb 2002
26
Creator Date TitleDoc
<record>
...
</record>
<record>
...
</record>
Serialisation
Transmission
De-serialisation
Remote application
XML and "meta-tagging", BECTa Pathfinders, Coventry, 26 Feb 2002
27
XML : document types & vocabularies
• “XML lets me make up names for element types! Great!”
• But….– XML says nothing about what your names mean– will a human recipient of your document recognise
your <level> element? – will a software agent process your <level> element
correctly?
• Communication requires consensus on– structural model of class of document/data– labelling of components– semantics of components
• Shared use of common XML “vocabularies”
XML and "meta-tagging", BECTa Pathfinders, Coventry, 26 Feb 2002
28
XML : DTDs, XML Schemas
• Means to codify syntax rules of vocabulary– what markup is allowed– structural constraints on use of markup– N.B. say nothing about what markup means
• Document Type Definition– part of XML Recommendation
• W3C XML Schema– recent W3C recommendation– data-typing i.e. tighter control on element content– support for combining vocabularies– uses XML syntax
• Parser/authoring tool can validate markup of instance against rules in DTD or Schema
XML and "meta-tagging", BECTa Pathfinders, Coventry, 26 Feb 2002
29
XML : namespaces
• Applications wish to use element from multiple vocabularies (DTDs/Schemas)
– particularly true of metadata applications– problems of “name collisions”
• XML Namespaces– recommendation of W3C– provides universal naming mechanism
• Namespace – a collection of names– given a name, which has the form of a URI
• Element type names, attribute names qualified by a namespace name (a URI)
– through use of prefix
XML and "meta-tagging", BECTa Pathfinders, Coventry, 26 Feb 2002
31
The problem with XML
• Statement– this resource (song, document, picture... etc!) has
dc:creator “Don Van Vliet”
• Multiple expressions in XML<song id=“123”><title>Frownland</title><creator>Don Van Vliet</creator></song>
<lyric id=“456” title=“Frownland”><creator name=“Don Van Vliet”/></lyric>
<music id=“789” creator=“Don Van Vliet”><title text=“Frownland”/></music>
XML and "meta-tagging", BECTa Pathfinders, Coventry, 26 Feb 2002
32
The problem with XML (2)
• Different communities make different design choices for DTDs/XML Schemas
– all “good” (and valid)– human reader of document can interpret (maybe)– program needs prior “knowledge” of structural
conventions in each XML schema
• Within resource description community, meaning(s) of structure(s) may be limited
• Across communities, potentially unlimited – not scalable in an “open” environment– how to manage ever increasing set of conventions– always encountering unknown schemas
XML and "meta-tagging", BECTa Pathfinders, Coventry, 26 Feb 2002
33
The problem with XML (3)
“XML allows users to add arbitrary structure to their documents but says nothing about what the structures mean.”
– Berners-Lee, 2001
• Consensus on syntax– use of XML
• Consensus on semantics of terms– meaning of (uniquely named through XML
namespace) elements/attributes
• No consensus on meaning of structure– e.g. parent-child element relations
XML and "meta-tagging", BECTa Pathfinders, Coventry, 26 Feb 2002
34
Introducing RDF
• Resource Description Framework Model & Syntax
– Recommendation of W3C, 1999
• Generic “architecture” for metadata– set of conventions for applications exchanging
metadata– allow semantics to be defined by different resource
description communities– accommodate mixing of metadata from diverse
sources
XML and "meta-tagging", BECTa Pathfinders, Coventry, 26 Feb 2002
35
Introducing RDF (2)
• Defines – model for making statements about resources– conventions for encoding statements using XML
syntax
• Resource : any object identified by URI– not necessarily accessible via Web
• Property : “attribute” to describe resource– properties also uniquely identified by URI
• Statement : “triple” of specific resource, named property, and value
XML and "meta-tagging", BECTa Pathfinders, Coventry, 26 Feb 2002
36
The RDF model
http://js.org/doc/1author
John
A resource has some property whose value is either (i) a simple string value (literal)…
– The resource identified by the URI http://js.org/doc/1 has a property “author” whose value is “John”
– Or, “John” is the “author” of the resource identified by http://js.org/doc/1
XML and "meta-tagging", BECTa Pathfinders, Coventry, 26 Feb 2002
37
The RDF model (2)
… or (ii) another resource...
http://js.org/doc/1author
John [email protected]
name email
– The value of property “author” is another resource which has a property “name” with value “John” and a property “email” with value “[email protected]”
XML and "meta-tagging", BECTa Pathfinders, Coventry, 26 Feb 2002
38
The RDF model (3)
… which may itself have a URI
http://js.org/doc/1
author
John
http://js.org/person/john
name email
XML and "meta-tagging", BECTa Pathfinders, Coventry, 26 Feb 2002
39
The power of the RDF model
• Extensible model– supports any vocabularies
• Supports arbitrary complexity of description• URIs as unique fixed points to identify
– resources– properties
• Descriptions created independently can be “merged” using URIs as “anchors”
– i.e. supports distributed metadata
XML and "meta-tagging", BECTa Pathfinders, Coventry, 26 Feb 2002
40
First source
http://js.org/doc/1
author
John
http://js.org/person/john
name email
XML and "meta-tagging", BECTa Pathfinders, Coventry, 26 Feb 2002
41
Second source
http://js.org/doc/1subject
XML
XML and "meta-tagging", BECTa Pathfinders, Coventry, 26 Feb 2002
42
Third source
http://js.org/person/john
organisation
JS Foundation
XML and "meta-tagging", BECTa Pathfinders, Coventry, 26 Feb 2002
43
http://js.org/doc/1
author
John
http://js.org/person/john
name email
http://js.org/person/john
organisation
JS Foundation
http://js.org/doc/1
subject
XML
Three descriptions merged
XML and "meta-tagging", BECTa Pathfinders, Coventry, 26 Feb 2002
44
The RDF XML syntax
• XML representation of model– to store/exchange descriptions
• All property names made unique through use of XML namespaces
• Conventions for the meaning of structures in XML document
• Service can “know in advance” the meaning of structures
– even if unanticipated vocabularies used
– “partial understanding”
– can read multiple descriptions into store and “merge” on URIs
• Generated by tools…. more later!
XML and "meta-tagging", BECTa Pathfinders, Coventry, 26 Feb 2002
45
RDF Schema
• Resource Description Framework Schema– Candidate Recommendation of W3C, 2000
• Provides mechanisms to describe– terms used in RDF statements– semantic relationships between terms
– e.g. Dublin Core metadata element set defined using RDF(S)
• Defines type system– resources grouped into classes– classes related hierarchically (subClassOf)– properties related hierarchically (subPropertyOf)– use of properties constrained (domain, range)
XML and "meta-tagging", BECTa Pathfinders, Coventry, 26 Feb 2002
46
RDF Schema (2)
• RDF Schema employs RDF model– expressible using RDF/XML syntax
• Other “ontology languages” building on RDF/RDFS
– e.g. DAML-OIL– describe more complex relations between entities
• Berners-Lee’s vision of “Semantic Web”– software agents navigating web of machine-
processable descriptions and “ontologies” – making inferences about data collected– communicating via “partial understanding”
XML and "meta-tagging", BECTa Pathfinders, Coventry, 26 Feb 2002
47
Summary
• Resource discovery metadata is shared across boundaries of domain, sector etc
• Effective sharing requires consensus on– semantics : shared vocabularies of uniquely
named terms– syntax : XML– structure : common XML DTD/schema or RDF?
• Simple RDF model as basis of “machine-processable” statements about resources
XML and "meta-tagging", BECTa Pathfinders, Coventry, 26 Feb 2002
48
Acknowledgements
UKOLN is funded by Resource: the Council for Museums, Archives and Libraries, the Joint Information Systems Committee (JISC) of the UK higher and further education funding councils, as well as by project funding from the JISC and the European Union. UKOLN also receives support from the University of Bath where it is based.
http://www.ukoln.ac.uk/