5
A centre of expertise in digital information management Approaches To The Validation Of Dublin Core Metadata Embedded In (X)HTML Documents Background The Dublin Core Metadata Element Set is a simple set of metadata elements used for resource discovery. It has been widely adopted in digital library applications. One simple mechanism for deploying DC metadata is to embed it in (X)HTML documents, following conventions recommended by DCMI. The Problem Many (X)HTML document creators limit their "validation" to checking the presentation of their documents in Web browsers. Even where authors do use (X)HTML syntax validators, such tools do not check that embedded metadata conforms to the conventions recommended by DCMI. Furthermore, to be really useful to the metadata creator, a validation process should check the metadata against the specific requirements of the service that will use that metadata (an "application profile").

A centre of expertise in digital information management Approaches To The Validation Of Dublin Core Metadata Embedded In (X)HTML Documents Background The

Embed Size (px)

Citation preview

Page 1: A centre of expertise in digital information management Approaches To The Validation Of Dublin Core Metadata Embedded In (X)HTML Documents Background The

A centre of expertise in digital information management

Approaches To The Validation Of Dublin Core Metadata Embedded In (X)HTML Documents

Approaches To The Validation Of Dublin Core Metadata Embedded In (X)HTML Documents

BackgroundThe Dublin Core Metadata Element Set is a simple set of metadata elements used for resource discovery.It has been widely adopted in digital library applications. One simple mechanism for deploying DC metadata is to embed it in (X)HTML documents, following conventions recommended by DCMI.

The ProblemMany (X)HTML document creators limit their "validation" to checking the presentation of their documents in Web browsers.Even where authors do use (X)HTML syntax validators, such tools do not check that embedded metadata conforms to the conventions recommended by DCMI.Furthermore, to be really useful to the metadata creator, a validation process should check the metadata against the specific requirements of the service that will use that metadata (an "application profile").

Page 2: A centre of expertise in digital information management Approaches To The Validation Of Dublin Core Metadata Embedded In (X)HTML Documents Background The

A centre of expertise in digital information management

A Simple Approach To ValidationA Simple Approach To Validation

Use of DC-dotDC-dot is a popular Web-based tool for creating and managing Dublin Core metadata. DC-dot can also be used to carry out simple validation of

Dublin Core embedded in HTML resources.

Survey FindingsUse of DC-dot across a digital library programme showed that the entry points contained various errors in the representation of Dublin Core:

• Use of DC.Author rather than DC.Creator• Incorrect format of date field• Incorrect use of delimiters

Survey FindingsUse of DC-dot across a digital library programme showed that the entry points contained various errors in the representation of Dublin Core:

• Use of DC.Author rather than DC.Creator• Incorrect format of date field• Incorrect use of delimiters

Limitations of DC-dotDC-dot has some limitations:

• It was not designed primarily as a validation tool

• It performs only basic validation

• It validates againsta single set of rules

The DC-dot Tool

Page 3: A centre of expertise in digital information management Approaches To The Validation Of Dublin Core Metadata Embedded In (X)HTML Documents Background The

A centre of expertise in digital information management

Using An RDF ValidatorUsing An RDF ValidatorUse of An RDF ValidatorAn alternative approach was to make use of W3C's online Dublin Core to RDF XSLT transformation service and the RDF validator. This approach made use of several online services which were chained together:

• Tidy to convert project home page to XHTML format• Dublin Core to RDF XSLT transformation service to

convert embedded Dublin Core elements to RDF/XML

• RDF validation service to validate the RDF/XML

Comments This approach helped by providing a visual display of the Dublin Core metadata.It was noticed, for example, that one page contained an invalid identifier: <http:/www.foo.ac.uk/...> rather than < http://www.foo.ac.uk/...>However since the RDF validation service has no understanding of the semantics of the Dublin Core metadata, this approach has its limitations.

Comments This approach helped by providing a visual display of the Dublin Core metadata.It was noticed, for example, that one page contained an invalid identifier: <http:/www.foo.ac.uk/...> rather than < http://www.foo.ac.uk/...>However since the RDF validation service has no understanding of the semantics of the Dublin Core metadata, this approach has its limitations.

The RDF Validator Tool

Page 4: A centre of expertise in digital information management Approaches To The Validation Of Dublin Core Metadata Embedded In (X)HTML Documents Background The

A centre of expertise in digital information management

The dcmeta XSLT stylesheet:

• Creates a report on the embedded DC metadata

• Checks that general conventions for DC metadata are followed

• Checks the metadata against a specified "application profile" of the DC Metadata Element Set.

The profile is a set of rules which specify:

• Permitted DC properties (e.g. only the 15 DC elements are allowed)

• Minimum/maximum permitted occurrences of a specified property (e.g. only one occurrence of DC.Title permitted)

• Permitted encoding schemes (e.g. DC.Subject properties should have the scheme "LCSH")

• Permitted values (e.g. DC.Publisher must have the value "UKOLN")

These rules are described in a secondary XML document read by the stylesheet.

dcmeta: An XSLT Approachdcmeta: An XSLT Approach

Use of XSLTWe have employed XSLT to provide validation of Dublin Core metadata embedded in (X)HTML resources.

The dcmeta Tool

Page 5: A centre of expertise in digital information management Approaches To The Validation Of Dublin Core Metadata Embedded In (X)HTML Documents Background The

A centre of expertise in digital information management

ConclusionsConclusions

DeploymentThe stylesheet can be deployed using any XSLT engine e.g.

• Using a Javascript bookmarklet to apply the transformation in a browser with built-in XSLT engine (e.g. IE/MSXML)

• As an online service using a server-side transformation

• Run from the command line

SummaryThis poster summarises a number of approaches to validating Dublin Core metadata embedded in HTML resources. The poster reports on initial work in the development of an XSLT-based tool which can be used for validation of Dublin Core metadata.

Further DetailsThe stylesheet is available, together with details of the structure of the "profile" document, at <http://www.ukoln.ac.uk/metadata/dcmeta/>

For further information please contact Pete Johnston at the email address <[email protected]>