Upload
anthony-mcdaniel
View
212
Download
0
Embed Size (px)
Citation preview
A centre of expertise in digital information management
Approaches To The Validation Of Dublin Core Metadata Embedded In (X)HTML Documents
Approaches To The Validation Of Dublin Core Metadata Embedded In (X)HTML Documents
BackgroundThe Dublin Core Metadata Element Set is a simple set of metadata elements used for resource discovery.It has been widely adopted in digital library applications. One simple mechanism for deploying DC metadata is to embed it in (X)HTML documents, following conventions recommended by DCMI.
The ProblemMany (X)HTML document creators limit their "validation" to checking the presentation of their documents in Web browsers.Even where authors do use (X)HTML syntax validators, such tools do not check that embedded metadata conforms to the conventions recommended by DCMI.Furthermore, to be really useful to the metadata creator, a validation process should check the metadata against the specific requirements of the service that will use that metadata (an "application profile").
A centre of expertise in digital information management
A Simple Approach To ValidationA Simple Approach To Validation
Use of DC-dotDC-dot is a popular Web-based tool for creating and managing Dublin Core metadata. DC-dot can also be used to carry out simple validation of
Dublin Core embedded in HTML resources.
Survey FindingsUse of DC-dot across a digital library programme showed that the entry points contained various errors in the representation of Dublin Core:
• Use of DC.Author rather than DC.Creator• Incorrect format of date field• Incorrect use of delimiters
Survey FindingsUse of DC-dot across a digital library programme showed that the entry points contained various errors in the representation of Dublin Core:
• Use of DC.Author rather than DC.Creator• Incorrect format of date field• Incorrect use of delimiters
Limitations of DC-dotDC-dot has some limitations:
• It was not designed primarily as a validation tool
• It performs only basic validation
• It validates againsta single set of rules
The DC-dot Tool
A centre of expertise in digital information management
Using An RDF ValidatorUsing An RDF ValidatorUse of An RDF ValidatorAn alternative approach was to make use of W3C's online Dublin Core to RDF XSLT transformation service and the RDF validator. This approach made use of several online services which were chained together:
• Tidy to convert project home page to XHTML format• Dublin Core to RDF XSLT transformation service to
convert embedded Dublin Core elements to RDF/XML
• RDF validation service to validate the RDF/XML
Comments This approach helped by providing a visual display of the Dublin Core metadata.It was noticed, for example, that one page contained an invalid identifier: <http:/www.foo.ac.uk/...> rather than < http://www.foo.ac.uk/...>However since the RDF validation service has no understanding of the semantics of the Dublin Core metadata, this approach has its limitations.
Comments This approach helped by providing a visual display of the Dublin Core metadata.It was noticed, for example, that one page contained an invalid identifier: <http:/www.foo.ac.uk/...> rather than < http://www.foo.ac.uk/...>However since the RDF validation service has no understanding of the semantics of the Dublin Core metadata, this approach has its limitations.
The RDF Validator Tool
A centre of expertise in digital information management
The dcmeta XSLT stylesheet:
• Creates a report on the embedded DC metadata
• Checks that general conventions for DC metadata are followed
• Checks the metadata against a specified "application profile" of the DC Metadata Element Set.
The profile is a set of rules which specify:
• Permitted DC properties (e.g. only the 15 DC elements are allowed)
• Minimum/maximum permitted occurrences of a specified property (e.g. only one occurrence of DC.Title permitted)
• Permitted encoding schemes (e.g. DC.Subject properties should have the scheme "LCSH")
• Permitted values (e.g. DC.Publisher must have the value "UKOLN")
These rules are described in a secondary XML document read by the stylesheet.
dcmeta: An XSLT Approachdcmeta: An XSLT Approach
Use of XSLTWe have employed XSLT to provide validation of Dublin Core metadata embedded in (X)HTML resources.
The dcmeta Tool
A centre of expertise in digital information management
ConclusionsConclusions
DeploymentThe stylesheet can be deployed using any XSLT engine e.g.
• Using a Javascript bookmarklet to apply the transformation in a browser with built-in XSLT engine (e.g. IE/MSXML)
• As an online service using a server-side transformation
• Run from the command line
SummaryThis poster summarises a number of approaches to validating Dublin Core metadata embedded in HTML resources. The poster reports on initial work in the development of an XSLT-based tool which can be used for validation of Dublin Core metadata.
Further DetailsThe stylesheet is available, together with details of the structure of the "profile" document, at <http://www.ukoln.ac.uk/metadata/dcmeta/>
For further information please contact Pete Johnston at the email address <[email protected]>