24
LAND AND WATER FLAGSHIP Capabilities and Status of the Linked Data Registry technology with recommendations on next steps for vocabulary management Simon J D Cox 12 September 2014 For: Bureau of Meteorology/Water Information Research and Development Alliance Tony Boston, Paul Sheahan

Capabilities and Status of the Linked Data Registry technology · 9/12/2014  · capability of LDR for the applications under consideration. Next it considers the use of LDR for a

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Capabilities and Status of the Linked Data Registry technology · 9/12/2014  · capability of LDR for the applications under consideration. Next it considers the use of LDR for a

LAND AND WATER FLAGSHIP

Capabilities and Status of the Linked Data Registry technology

with recommendations on next steps for vocabulary management

Simon J D Cox

12 September 2014

For: Bureau of Meteorology/Water Information Research and Development Alliance Tony Boston, Paul Sheahan

Page 2: Capabilities and Status of the Linked Data Registry technology · 9/12/2014  · capability of LDR for the applications under consideration. Next it considers the use of LDR for a

Citation

Cox SJD (2014) Capabilities and Status of the Linked Data Registry technology. CSIRO, Australia.

Copyright and disclaimer

© 2014 CSIRO To the extent permitted by law, all rights are reserved and no part of this publication covered by copyright may be reproduced or copied in any form or by any means except with the written permission of CSIRO.

Important disclaimer

CSIRO advises that the information contained in this publication comprises general statements based on scientific research. The reader is advised and needs to be aware that such information may be incomplete or unable to be used in any specific situation. No reliance or actions must therefore be made on that information without seeking prior expert professional, scientific and technical advice. To the extent permitted by law, CSIRO (including its employees and consultants) excludes all liability to any person for any consequences, including but not limited to all losses, damages, costs, expenses and any other compensation, arising directly or indirectly from using this publication (in part or in whole) and any information or material contained in it.

Revision history

Date Version Description Author

2014-09-01 0.1 Initial version, without executive summary, recommendations

SJDC

2014-09-03 0.2 Complete version SJDC

2014-09-12 0.3 With modifications in response to comments from BoM

SJDC

Page 3: Capabilities and Status of the Linked Data Registry technology · 9/12/2014  · capability of LDR for the applications under consideration. Next it considers the use of LDR for a

WIRADA | i

Foreword

This report has been prepared as deliverable 2.5 for the WIRADA Informatics - Data Services Project 2014-15, as a result of work under Activity 2.2: Water Information Vocabularies.

The overall goal of the project is to use Linked Data approaches to address various challenges in delivery of water data, relating to delivery to both human and machine clients, the creation and maintenance of relationships within and between real-world features and data and other descriptors relating to them, and providing suitable context so data can be used in different ways. Task 2.2 concerns Delivering Water Information Vocabularies. While previous WIRADA projects have developed practices concerning the formalization and delivery of vocabularies of terms and definitions, vocabulary maintenance is currently managed in an ad-hoc manner, and not transparently recorded. Other groups in related field are also concerned about this issue. In this report we describe and evaluate a technology that shows some promise in addressing the issue.

Page 4: Capabilities and Status of the Linked Data Registry technology · 9/12/2014  · capability of LDR for the applications under consideration. Next it considers the use of LDR for a

ii | Report Title

Contents

Foreword ............................................................................................................................................................. i

Acknowledgments ............................................................................................................................................. iii

Executive summary............................................................................................................................................ iv

1 Introduction .......................................................................................................................................... 1

2 Scope ..................................................................................................................................................... 3

3 LDR Capabilities ..................................................................................................................................... 4

3.1 Registers, sub-registers ............................................................................................................... 4

3.2 Registered items ......................................................................................................................... 5

3.3 Lifecycle metadata ...................................................................................................................... 5

3.4 Interface ...................................................................................................................................... 6

3.5 Backup, dump and restore .......................................................................................................... 7

4 Applications ........................................................................................................................................... 8

4.1 Manage a store of controlled vocabularies ................................................................................ 8

4.2 Manage other RDF resources ..................................................................................................... 8

4.3 Search and query ........................................................................................................................ 8

4.4 Persistent URIs, redirection ........................................................................................................ 9

4.5 Vocabulary maintenance workflow .......................................................................................... 10

5 Recommendations .............................................................................................................................. 12

6 Summary ............................................................................................................................................. 14

Glossary ............................................................................................................................................................ 15

References ........................................................................................................................................................ 16

Page 5: Capabilities and Status of the Linked Data Registry technology · 9/12/2014  · capability of LDR for the applications under consideration. Next it considers the use of LDR for a

WIRADA | iii

Acknowledgments

Ben Leighton and Jonathan Yu (CSIRO) configured the test LDR installation. Dave Reynolds (Epimorphics), Jeremy Tandy and Mark Hedley (UK Met Office) provided long-range advice in explaining some of the LDR details, and debugging our test LDR installation.

Page 6: Capabilities and Status of the Linked Data Registry technology · 9/12/2014  · capability of LDR for the applications under consideration. Next it considers the use of LDR for a

iv | Report Title

Executive summary

This report describes the capabilities (and some limitations) of the Linked Data Registry software, developed by Epimorphics on behalf of WMO and UK DEFRA as a vocabulary management technology. The methodology has primarily been based on experimenting with a test deployment, supplemented with a review of the available documentation. LDR appears to provide a clean and rigorous solution to the fine-grained maintenance of content in an RDF triple-store, with the status and history recorded and reported. However, it is not a comprehensive vocabulary management solution. In particular, LDR does not replace functionality provided by SISSVoc and PIDsvc. However, an analysis of the BoM and NEII requirements for vocabulary maintenance is a prerequisite to determine the full set of components required for vocabulary management and publication.

A number of issues are identified and includes some recommendations for further work in the context of the evaluation and the broader WIRADA project.

Page 7: Capabilities and Status of the Linked Data Registry technology · 9/12/2014  · capability of LDR for the applications under consideration. Next it considers the use of LDR for a

WIRADA | 1

1 Introduction

The utility of water data requires that the definitions used in a dataset be available to users of the data. Water information vocabularies are currently published by the Bureau in a variety of forms, but are not available externally in a machine readable way (e.g. using web services). Providing water vocabularies using web services:

• increases interoperability of data systems by allowing comparability of data from different sources; • ensures data consumers (e.g. other water agencies, the public) are able to interpret the data

correctly, based on agreed upon definitions.

Interoperability of datasets is enhanced through the use of shared vocabularies, or at least that mappings are available from the terminology used in one dataset to another. In both cases, essential requirements are that vocabularies are published to all users, that the items in the vocabularies are persistently identified so that references are reliable, and that the formalization is rich enough to support the expression of relationships, either locally or externally. Hence, the WIRADA Data Services project includes the goal:

Deploy a vocabulary service that provides machine-readable definitions of key hydrological concepts. The service will support governance of these definitions as they change through time. This will allow the Bureau to provide national definitions for core concepts, resulting in improved comparability of data. e.g. is your rainfall observation the same as mine?

Semantic Web technologies provide a basis for formalization of vocabularies, in particular through the Simple Knowledge Organization System (SKOS)[11], which is a Resource Description Framework (RDF) vocabulary [6] for formalizing simple hierarchical vocabularies. The Linked Data principles [3] that information resources should be accessed via persistent URIs, and representations provided in standard formats suitable for either human and machine (HTML, RDF/XML, Turtle, JSON), along with the SPARQL query language and web protocol [7,9], provide the basis for web distribution of vocabulary content. Previous WIRADA projects have used these technologies to addressed parts of the vocabulary deployment challenge:

• the Water Regulations vocabularies were converted to RDF/SKOS • vocabularies in OGC WaterML2, developed through WIRADA, are formalized in RDF/SKOS • SISSVoc [4,5,8] , a web-API for SKOS vocabularies, was developed jointly through AuScope and

WIRADA.

However, the important issue of maintenance remains. This includes a number of potential ways that vocabularies and their content may be changed, including

• Adding vocabularies • Adding items to a vocabulary • Updating terms and definitions, including relationships (hierarchical and other) with other

vocabulary items • Removing items from a vocabulary • Adding and updating mappings between items in separate vocabularies

Furthermore, recognizing that vocabularies are not usually static, it is important that users can determine the status of a vocabulary and its contents, and preferably can recover a previous state that may have applied at the time a specific dataset was prepared. The publication environment should therefore make this information available to users.

The technologies used or developed in earlier WIRADA projects do not provide specific support for lifecycle management or versioning of vocabulary terms, and do not make changes and status externally visible. Meanwhile, the UK linked-data community (sponsored by the World Meteorological Organization through UK MetOffice, and data.gov.uk through DEFRA) has commissioned Epimorphics Ltd. to develop a Linked

Page 8: Capabilities and Status of the Linked Data Registry technology · 9/12/2014  · capability of LDR for the applications under consideration. Next it considers the use of LDR for a

2 | Report Title

Data Registry service (“LDR”) [13]. The system functionality is based on the ISO 19135 Registry standard (Procedures for item registration) [10]. It provides a mechanism for registering and managing individual information items, in the context of a shared linked-data infrastructure. Initial analysis suggests that it may provide an alternative base for vocabulary services, as well as some overlap with persistent identifier (URI redirection) functionality. Nevertheless, the LDR software has had only limited testing, and the role that it might play in a comprehensive vocabulary management and publication system is not clear.

In this report we evaluate LDR and assess the functionality provided, in particular in terms of overlaps and gaps relative to SISSVoc and other elements of our linked data publication systems (e.g. persistent URI management tools) and any specific gaps and bugs that exist in the implementation.

Page 9: Capabilities and Status of the Linked Data Registry technology · 9/12/2014  · capability of LDR for the applications under consideration. Next it considers the use of LDR for a

WIRADA | 3

2 Scope

In this document we report the results of an evaluation the Linked Data Registry software (LDR). The methodology has primarily been based on experimenting with a test deployment, supplemented with a review of the available documentation. The experiments have focussed primarily on vocabulary applications, but have also considered complementary uses of the LDR for registration of non-vocabulary semantic resources, and persistent URI (URI redirection) management. The report considers

- functions and mode of operation, - the Application Programming Interface, - the User Interface in the context of different types of users - consistency of the API with expectations relating to Semantic Web and Linked Data principles - overlaps with other components currently used in publication of vocabularies by the Bureau and in

other environmental science communities.

The report first summarizes the basic functionality of LDR, and identifies a number of issues that either need resolving in collaboration with the LDR developers or other users, or are basic deficiencies in the capability of LDR for the applications under consideration. Next it considers the use of LDR for a number of applications, starting with the core vocabulary management functions, but also looking at the complementary use-cases. The report finishes with some recommendations for further work in the context of the evaluation and the project.

Page 10: Capabilities and Status of the Linked Data Registry technology · 9/12/2014  · capability of LDR for the applications under consideration. Next it considers the use of LDR for a

4 | Report Title

3 LDR Capabilities

The Linked Data Registry (LDR) was developed to support the publication and maintenance of controlled vocabularies. It includes a repository (data store) for RDF-based resources, and supports registration of external resources of all types, including re-writing or re-basing the external URI if desired. The registration functionality is an implementation of the ISO 19135 workflow and register model [10].

It has been deployed as an operational system by WMO1 and as pilots by DEFRA2 and the UK Met Office3.

Documentation of LDR is provided on the project Wiki [13]. The following sections describe the capabilities and characteristics of LDR, determined primarily as a result of experimentation with a test deployment at http://registry.it.csiro.au, from where the examples below are taken.

3.1 Registers, sub-registers

1.A single registry instance controls a tree of resources with a common base URI which corresponds to the root register.

e.g. http://registry.it.csiro.au

2.A register may contain items and/or sub-registers. Every item or sub-register is owned by one register.

3.Every registered-item (including sub-registers) is denoted by a URI whose path implies a nested set of registers relative to the registry URI. The URI for a registered item is the URI for the containing register with the item id appended after a "/". The 'leaf' URI denotes the item, and the register that owns the item is denoted by the URI obtained by trimming the final element.

e.g. http://registry.it.csiro.au contains the sub-registers http://registry.it.csiro.au/agriculture http://registry.it.csiro.au/environment http://registry.it.csiro.au/geoscience and more

http://registry.it.csiro.au/agriculture/def/CABI-glossary/Adulterate is an item in the register http://registry.it.csiro.au/agriculture/def/CABI-glossary

4.All registration operations are scoped to a single register. One or more items may be added to a register per registration operation. It is not possible to load (or modify) a nested tree or graph in a single registration operation, as nesting implies membership multiple registers. Operations involving more than one register may be scripted.

1 http://codes.wmo.int/ 2 http://environment.data.gov.uk/registry/ 3 http://reference.metoffice.gov.uk/

Page 11: Capabilities and Status of the Linked Data Registry technology · 9/12/2014  · capability of LDR for the applications under consideration. Next it considers the use of LDR for a

WIRADA | 5

3.2 Registered items

1.Items of any type that are formalised in RDF may be stored in the registry's local triple store (e.g. SKOS concepts). Items persisted this way are known as locally managed.

2.Existing resources available on the web in any format may be registered locally. Items registered in this way are known as externally managed. The registry URI for an externally managed item provides an alternative, locally-based URI, hence registration implies local endorsement of externally managed content. When resolved, the local URI redirects to the external URI. The HTTP code for redirection can be configured on an item-by-item basis (302, 307, 200 (proxy) are the most common).

e.g. http://registry.it.csiro.au/ogc/doc/gml/3.2.1 is a local URI that redirects to http://portal.opengeospatial.org/files/20509

Issue 1 - Documentation Patterns for registration of externally managed resources are poorly documented. In

particular, the patterns within the batch loading file for controlling the localName of remotely managed

items.

3.An item belonging to one register may also be added to other registers in the registry. In this way existing items may be re-collated into new collections.

e.g. http://registry.it.csiro.au/agriculture/def/CABI-glossary-subset is a subset of specified items from http://registry.it.csiro.au/agriculture/def/CABI-glossary

Issue 2 - Documentation: Patterns for adding locally managed content to additional registers are poorly

documented. In particular, the patterns within the batch loading file for controlling the localName of items

added to new registers.

3.3 Lifecycle metadata

1.Each registered item, either locally- or externally-managed, is represented in the registry by a register-

item, which is a metadata record for the registered item. All lifecycle information (status, dates, submitter) is recorded as part of the register-item. The register-item is visible to anyone with permission to view the item itself. In LDR a register-item is denoted by a URI with an underscore "_" as the first character of its localName (the final element of the URI), so resources denoted by URIs with this pattern are incompatible with LDR. For locally-managed items, the names are usually related.

e.g. http://registry.it.csiro.au/agriculture/def/CABI-glossary/_Adulterate is the Register Item for http://registry.it.csiro.au/agriculture/def/CABI-glossary/Adulterate

2.The status of a registered item is set explicitly (submitted, invalid, experimental, stable, superseded, retired) and recorded in the corresponding register-item. State transitions must follow a logical

sequence4.

3.Registered items may be modified or replaced. Previous versions of the item may be retrieved

– at a URI that appends the desired version-number

e.g. http://registry.it.csiro.au/agriculture/def/CABI-glossary/Adulterate has multiple versions. Version #2 is identified http://registry.it.csiro.au/agriculture/def/CABI-glossary/_Adulterate:2

– through a GET request with a query string _versionAt={dateTime}

4 https://github.com/UKGovLD/ukl-registry-poc/wiki/Principles-and-concepts#status-and-life-cycle

Page 12: Capabilities and Status of the Linked Data Registry technology · 9/12/2014  · capability of LDR for the applications under consideration. Next it considers the use of LDR for a

6 | Report Title

4.The status of a registered item may set to 'superseded' (if replaced by an item with a different URI) or 'retired', but an item may not be deleted. The history of a retired or superseded item will remain visible.

5.Maintenance access to each item (including sub-registers) is controlled through user permissions

typically set on the register that contains it. Permissions can also be set at the item level. Permissions are grouped into roles defined in the security model5.

Issue 3 - Configuration: The security model should be examined carefully to determine if it matches the

requirements for vocabulary maintenance in the Bureau. This will concern both (a) status values; (b) status

sequence. However, this should be done in conjunction with a complete workflow analysis.

3.4 Interface

The contents of the registry may be created, maintained, and inspected either through a browser-based UI, or through a REST API.

3.4.1 API

1.The LDR REST API6 uses HTTP GET/POST/PUT/DELETE operations on the URL for the item, or its container.

2.Request parameters may be passed in a query string using the following Linked Data API keys

a. _format=(rdf|ttl|jsonld) b. _page={int} c. _view=(with_metadata|snapshot|version_list)

And a set of keys specific to the registry functionality:

d. _versionAt={dateTime} e. batch-referenced f. batch-managed g. entity={uri} h. non-member-properties i. query={text} j. status={status} k. tag l. update m. validate

Some functionality specified in LDA is currently not included in LDR (additional content negotiation options, language selection).

3.Configuration of some aspects of an LDR instance (collection-types, data entry forms) is achieved by registration of special .ttl files in reserved locations.

– e.g. see http://registry.it.csiro.au/system for system registers

5 https://github.com/UKGovLD/ukl-registry-poc/wiki/Security-model 6 https://github.com/UKGovLD/ukl-registry-poc/wiki/Api

Page 13: Capabilities and Status of the Linked Data Registry technology · 9/12/2014  · capability of LDR for the applications under consideration. Next it considers the use of LDR for a

WIRADA | 7

3.4.2 UI

4.The UI provides multiple views of a register, registered item, or register-item. The default styling is

oriented towards display of a lot of registration detail.

Issue 4 - Deployment: The registry UI should be simplified for casual users. While the default view is necessary for

maintenance (which is the primary use-case for the LDR), it is not suitable for inspection of registry content by

users who are not primarily concerned with the registration process.

Issue 5 - Security: Examination of existing deployments (at WMO) shows that functionality that is disallowed to non-

authenticated users (e.g. buttons triggering lifecycle dialogues) is made non-visible in the UI through CSS styling,

but remains available by inspection of the page source. This is a potential security hole, or at least an invitation,

since it provides API information to non-authorized users. Care must be taken to fully

suppress disallowed interface elements.

5.The UI provides data entry forms, or a dialogue to upload a local .ttl file. The primary data-entry form creates a simple SKOS concept (i.e. with only a single label and description, and no relationships).

Issue 6 - Documentation or bug : Configuration of data entry forms is poorly documented and hard to debug. Multiple

forms configurations have been loaded during testing, but only one of them has led to successful creation of a

form for data entry. Additional development/documentation is required.

6.User accounts and permissions are administered through the registry UI

Issue 7 - Authentication: identity management was originally designed to use Google's OpenID 2.0 service.

Google subsequently disabled this service for new applications, so a local username/password system was

implemented as a stop-gap. This needs further debugging, and a strategy for securing the initial data user

credentials must be developed (by default account information is an open text file, which was acceptable

when authentication was handled externally, but incompatible with a password-based system).

3.5 Backup, dump and restore

1.A backup of the current state of a complete registry instance, formatted as N-Quads, can be generated using a button in the standard UI

2.All data in a registry instance (including register-items, locally managed content, user information, system configuration, etc) is persisted in a local Jena triple-store, which is stored in a single file. This can be copied, archived, and replaced through file-system-level operations, in order to restore the registry to its state when the copy was made.

3.Finer grained export of registry content can be achieved using the SPARQL interface.

Issue 8 - Functionality: SPARQL query patterns for partial dumps needed

Page 14: Capabilities and Status of the Linked Data Registry technology · 9/12/2014  · capability of LDR for the applications under consideration. Next it considers the use of LDR for a

8 | Report Title

4 LDR Applications

4.1 Manage a store of controlled vocabularies

The reference application for LDR is management of a store of controlled vocabularies that are formalized using SKOS. With each item denoted by a URI represented by a separate register-item, LDR enables fine-grained management of the membership and state of RDF resources including individual SKOS Concepts and Collections. LDR provides what is effectively a content management system for RDF resources, and for links to external resources.

A data-entry form available by default in the UI can be used to create a SKOS Concept with a small set of properties (label, definition). Alternatively, the API can be used to load data (formalized as skos:Concepts, or using any other RDF vocabulary) pre-formatted in Turtle [1], with additional properties from SKOS or from another RDF vocabulary. Note that, while a register may be configured to limit registration to resources of a specified type(s) (i.e. to limit

{resourceURI} rdf:type {typeURI} .

statements to specific typeURIs), all other RDF statements will be accepted during registration (this is consistent with the RDF ‘open-world’ principles).

Under the LDR model, reg:Register/member maps most naturally to skos:Collection/member. The LDR URI pattern implies that the parent URI for a skos:Concept is the skos:Collection that owns it, which is a common though not ubiquitous pattern in SKOS vocabulary design. However, it is not necessary to use the skos:Collection type at all if not required by the particular vocabulary application.

There is no clear pattern for the use of skos:ConceptScheme in the context of LDR.

Since permissions can be set at item or register level, management of specific registers, or even items within registers, can be delegated flexibly.

4.2 Manage other RDF resources

Resources that are formalized using any RDF vocabulary may be managed in the registry. Data entry forms can be configured for convenient entry in the UI by designated maintainers. For RDF resources, LDR provides the capability generally associated with a 'content management system'.

For example, as a test of the capability for non-SKOS applications, an RDFS vocabulary for the normative elements in OGC standards (requirements, conformance tests) has been designed. Content using this ontology can be maintained as locally managed items in appropriate registers, with URIs matching the OGC policy in this area. e.g. http://registry.it.csiro.au/ogc/spec/omxml contains a subset of the requirements and tests from the O&M XML specification, with URIs that match the structure in the OGC specification.

4.3 Search and query

LDR provides a limited search capability. Searching for items with a specified word or phrase in the rdfs:label is available, in the API, and using a text entry field in the default UI.

However, LDR does not provide any query functionality that leverages the structure provided by SKOS (for example, search for broader or narrower terms). As query is not a registration function, it is outside the scope of LDR.

Page 15: Capabilities and Status of the Linked Data Registry technology · 9/12/2014  · capability of LDR for the applications under consideration. Next it considers the use of LDR for a

WIRADA | 9

Nevertheless, a SPARQL end-point [7] is provided for the content managed by an LDR (e.g. http://registry.it.csiro.au/system/query - http GET only). This can support an application or user interface built on SPARQL, such as SISSvoc providing specific vocabulary query functionality based on the SKOS vocabulary [5]. Hence, use of LDR for registration, update and maintenance is compatible with a complete system that provides other functionality through SPARQL-based applications.

Issue 9 - Deployment: Determine if search capability meets BoM needs. For additional SKOS-based query,

possibility of deployment of a SISSVoc API over a SPARQL endpoint managed by LDR needs to be assessed

4.4 Persistent URIs, redirection

Any item with a web address can be registered, with registration assigning an alternative URI in the registry domain. The registry URI is a persistent URI since no registered item can be deleted. This is potentially useful to provide ‘Cool URIs’ [2,14] for items with un-memorable or unstable URLs, and also to indicate 'endorsement' of the item by the register owner. Within a particular community, the registry URI may be more convenient or memorable than the native URL.

4.4.1 1:1 REDIRECTION

For example, the various versions of the GML standard have non-intuitive (opaque) URLs

http://portal.opengeospatial.org/files/7197 http://portal.opengeospatial.org/files/1034 http://portal.opengeospatial.org/files/1108

etc. These may be assigned persistent URIs by registering them as items in a single register, using the version number as the localName, providing more memorable persistent URIs, consistent with their well-known identity:

http://registry.it.csiro.au/ogc/doc/gml/1.0.0 http://registry.it.csiro.au/ogc/doc/gml/2.0 http://registry.it.csiro.au/ogc/doc/gml/2.1.1

A set of URI redirections ('delegated register items') like this can be uploaded as a batch into a single register. However, the process of assigning ‘memorable’ URIs on a 1:1 basis, not derived from some aspect of the original locator, is only practical for small numbers of items (hundreds or even thousand but not millions).

4.4.2 PARTIAL PATH REDIRECTION

A register can also be configured for ‘namespace-forwarding’, such that a complete set of external resources with a common base URI can be re-based with a register URI. For example any geologic timescale era can be appended to the register URI

http://registry.it.csiro.au/geoscience/classifier/timescale

which then redirects to http://resource.geosciml.org/classifier/ics/ischart/{EraName}, which is the official IUGS identifier for timescale elements. For example

http://registry.it.csiro.au/geoscience/classifier/timescale/Cambrian

http://registry.it.csiro.au/geoscience/classifier/timescale/Cretaceous

Page 16: Capabilities and Status of the Linked Data Registry technology · 9/12/2014  · capability of LDR for the applications under consideration. Next it considers the use of LDR for a

10 | Report Title

4.4.3 COMPARISON WITH OTHER SOLUTIONS

1:1 and partial path redirection are the cases managed by PURL [15], so LDR appears to be a replacement for PURL installations. These cases are likely to apply in many document publishing scenarios, and probably in most vocabulary publishing cases. LDR exceeds PURL in making the lifecycle information transparently available to users, and also making all registration and RDF content data available for low-level query through the SPARQL endpoint

However, LDR does not support the functionality of more advanced URI resolver software, like Apache mod-

rewrite7 or the SISS PIDsvc8, both of which are currently used in BoM. Those systems also support redirection patterns based on regular-expressions, and other rules (for example - rule inheritance) which are outside the scope of LDR. While LDR can be used to manage redirections, it should probably be limited to URI sets related to relatively small sets of controlled resources, like vocabularies.

4.5 Vocabulary maintenance workflow

A vocabulary submission and review process could, in principle, be managed directly in the LDR, since any user can be granted insertion, modification, status-update or other necessary permissions for any register. However, the functionality of LDR is tightly scoped: lifecycle is transparent, but the application does not provide any comment or discussion capability, and does not verify links or compute semantic closure. Furthermore, other features of LDR introduce side-effects that make use of LDR for the complete management of the vocabulary maintenance process practical only in limited cases. For example, the 'no deletions' feature, along with the standard status-transition sequence, mean that a broadly accessible submission process would risk cluttering a registry with candidate submissions that are later deemed 'invalid', but which would nevertheless

be visible by default in the standard UI to any logged-in user

prevent the re-use of useful URIs, since 'invalid' is a final state for a registered item9.

A complete vocabulary maintenance workflow will require a range of additional elements going significantly beyond those provided immediately or in reasonable prospect by LDR. The tools required will depend the vocabulary maintenance process. As well as

1. an RDF publication and versioning component like LDR, and

2. a query API like SISSVoc,

a complete system will probably also include

3.a process for development of consensus around vocabulary content within the relevant community, supported by collaboration tools for discussion and issue tracking (Wiki, ticket system)

4. an IDE that enables inspection of content proposals in the context of a complete vocabulary and related vocabularies. This would normally require computation of closure of a set of RDF resources, and probably other reasoning functions. (e.g. Protégé, TopBraid Composer)

5.a VCS for management of data prior to loading into the registry (e.g. Git, Subversion)

While LDR appears to provide an elegant and flexible system for fine-grained publication and versioning of vocabulary content in a system built on standard linked-data technologies, it does not address many other elements of the workflow likely to be required for a full vocabulary maintenance workflow.

7 http://httpd.apache.org/docs/current/mod/mod_rewrite.html 8 https://www.seegrid.csiro.au/wiki/Siss/PIDServiceUserGuide 9 Under the standard configuration, an item whose status has been set to 'invalid' cannot be revived even if the intention is to completely change its content.

Page 17: Capabilities and Status of the Linked Data Registry technology · 9/12/2014  · capability of LDR for the applications under consideration. Next it considers the use of LDR for a

WIRADA | 11

Issue 10 - Workflow and system analysis: Workflow required for development, maintenance, and

publication of vocabularies covering the uses cases known or anticipated for vocabularies used for data

managed exclusively in the Bureau and also for vocabularies that need to be managed by a broader

community in connection with NEII.

There are other vocabulary maintenance and publication products available. In a recent study [5], we identified PoolParty [16] and SKOSMOS [12] as having a scope most similar to SISSVoc. Both of these also include a vocabulary maintenance interface. PoolParty is a commercial product, while SKOSMOS is open source (PHP). TemaTres [17] provides another PHP-based option.

Finally, RDF-based vocabulary management and publication is a special case of web content management. There are many systems available for this, with the open source PHP-based Drupal10 system in use in a number of Australian Government agencies. Drupal is also used for application development in some of the most innovative research institutions working in the Semantic Web (e.g. the Tetherless World Constellation11 at Rensselaer Polytechnic Institute), though it is understood that this does not reach far into the semantic stack which is generally managed separately to more general web-site content and applications.

Issue 11 - Tool analysis: Comparison required of the capabilities of vocabulary publication and

maintenance components, to provide a basis for developing the best stack of tools for a vocabulary publication

system meeting the requirements identified in the workflow analysis.

10 https://www.drupal.org/ 11 http://tw.rpi.edu/

Page 18: Capabilities and Status of the Linked Data Registry technology · 9/12/2014  · capability of LDR for the applications under consideration. Next it considers the use of LDR for a

12 | Report Title

5 Recommendations

Eleven issues were identified in the analysis above. It is recommended that these be resolved as follows:

Issue Response

1. Documentation Patterns for registration of externally managed resources are poorly documented. In particular, the patterns within the batch loading file for controlling the localName of remotely managed items.

Work with designers and developers of LDR to improve documentation

12

2. Documentation Patterns for registration of externally managed resources are poorly documented. In particular, the patterns within the batch loading file for controlling the localName of remotely managed items.

Ditto

3. Configuration: The security model should be examined carefully to determine if it matches the requirements for vocabulary maintenance in the Bureau. This will concern both (a) status values; (b) status sequence. However, this should be done in conjunction with a complete workflow analysis.

Examine security model in context of workflow analysis

13

4. Deployment: The registry UI should be simplified for casual users. While the default view is necessary for maintenance (which is the primary use-case for the LDR), it is not suitable for inspection of registry content by users who are not primarily concerned with the registration process.

Develop simplified UI as part of a BoM LDR deployment

5. Security: Examination of existing deployments (at WMO) shows that functionality that is disallowed to non-authenticated users (e.g. buttons triggering lifecycle dialogues) is made non-visible in the UI through CSS styling, but remains available by inspection of the page source. This is a potential security hole, or at least an invitation, since it provides API information to non-authorized users. Care must be taken to fully suppress disallowed interface elements.

Audit API and UI related to content security concerns.

Note: backup and recovery may be an appropriate security approach.

6. Documentation or bug : Configuration of data entry forms is poorly documented and hard to debug. Multiple forms configurations have been loaded during testing, but only one of them has led to successful creation of a form for data entry. Additional development/documentation is required.

Work with designers and developers of LDR to verify that the form configuration mechanism is functional and improve documentation

7. Authentication: identity management was originally designed to use Google's OpenID 2.0 service. Google subsequently disabled this service for new applications, so a local username/password system was implemented as a stop-gap. This needs further debugging, and a strategy for securing the initial data user credentials must be

Complete new authentication implementation

12 Some registration patterns are illustrated on this page: https://github.com/UKGovLD/ukl-registry-poc/wiki/Api-example:-registration but further elaboration is needed. 13 N.B. Lifecycle customization is on the GitHib issue list https://github.com/UKGovLD/registry-core/issues/21 .

Page 19: Capabilities and Status of the Linked Data Registry technology · 9/12/2014  · capability of LDR for the applications under consideration. Next it considers the use of LDR for a

WIRADA | 13

developed (by default account information is an open text file, which was acceptable when authentication was handled externally, but incompatible with a password-based system).

8. Functionality: SPARQL query patterns for partial dumps needed. Developed SPARQL queries for partial dumps, e.g. single registers or nested trees.

9. Deployment: Determine if search capability meets BoM needs. For additional SKOS-based query, possibility of deployment of a SISSVoc API over a SPARQL endpoint managed by LDR needs to be assessed

Compare search capability with BoM requirements. Deploy and test a SISSVoc-over-LDR.

10. Workflow and system analysis: Workflow required for development, maintenance, and publication of vocabularies covering the uses cases known or anticipated for vocabularies used for data managed exclusively in the Bureau and also for vocabularies that need to be managed by a broader community in connection with NEII.

Map the component types identified here to the workflows.

11. Tool analysis: Comparison required of the capabilities of vocabulary publication and maintenance component, to provide a basis for developing the best stack of tools for a vocabulary publication system meeting the requirements identified in the workflow analysis.

Tabulate the capabilities of SISSVoc, LDR, PIDsvc, SKOSMOS, PoolParty, and Drupal, relating to their potential application within a RESTful vocabulary publication and maintenance system.

Page 20: Capabilities and Status of the Linked Data Registry technology · 9/12/2014  · capability of LDR for the applications under consideration. Next it considers the use of LDR for a

14 | Report Title

6 Summary

In this report we have described the capabilities (and some limitations) of the LDR technology. We have focussed on its functionality applied to vocabulary content. The evaluation was triggered in response to suggestions that LDR may provide a single component replacement for multiple elements use in current vocabulary publication, and also support a more transparent and orderly approach to vocabulary maintenance.

LDR appears to provide a clean and rigorous solution to one element of a process to maintain vocabulary content formalized using SKOS: fine-grained maintenance of content in an RDF triple-store, with the status and history recorded and reported.

The advantages over the current arrangements are

- maintenance at both register and item level is transparent, and lifecycle and status information is explicit

- maintenance of individual registers can be easily delegated to different accounts - a single system can replace the functionality of multiple loosely coupled components, reducing the

brittleness and coordination challenges

However, LDR does not support certain functionality provided in currently deployed systems by SISSVoc and PIDsvc, and it is unlikely that even these three components are sufficient to support a complete vocabulary management system. However, it is not possible to determine the full set of components required without a more rigorous analysis of the BoM and NEII requirements for vocabulary maintenance.

In addition, the documentation and aspects of the implementation for LDR is still incomplete, and some recommendations for remediating these are provided.

Page 21: Capabilities and Status of the Linked Data Registry technology · 9/12/2014  · capability of LDR for the applications under consideration. Next it considers the use of LDR for a

WIRADA | 15

Glossary

API Application Programming Interface - operations and

interfaces provided to access the product functionality

from another application

IDE Interactive Development Environment - software

providing a convenient UI for designing a technical

product, such as code or formatted data

LDR Linked Data Registry - the software under review in this

document

register container for a set of items managed under a single

governance regime, often of a homogeneous type

registered item a member of a register

register-item the registration record for a registered item

registry a system used to host and manage one or more

registers

RDF Resource Description Framework - W3C standard for

knowledge representation as a set of 'triples' joined

together into a 'graph'

SKOS Simple Knowledge Organization System - W3C standard

vocabulary for organizing a set of definitions and

simple relationships between them; an RDF application

SPARQL SPARQL RDF Query Language – W3C standard low-level

query language for RDF data

sub-register register that is a member of (owned or contained by)

another register

UI User Interface

VCS Version Control System

Page 22: Capabilities and Status of the Linked Data Registry technology · 9/12/2014  · capability of LDR for the applications under consideration. Next it considers the use of LDR for a

16 | Report Title

References

[1] D. Beckett, T. Berners-Lee, E. Prud’hommeaux, G. Carothers, RDF 1.1 Turtle, World Wide Web Consortium, 2014, Available at: http://www.w3.org/TR/2014/REC-turtle-20140225/.

[2] T. Berners-Lee, Hypertext Style: Cool URIs don’t change. [Online]. Available at: www.w3.org/Provider/Style/URI.html. [Accessed: 03-Sep-2014].

[3] T. Berners-Lee, Linked Data - Design Issues, W3C Design Issues. [Online]. Available at: http://www.w3.org/DesignIssues/LinkedData.html. [Accessed: 13-Feb-2014].

[4] S.J.D. Cox, K. Mills, F. Tan, Vocabulary services to support scientific data interoperability, in: Geophys. Res. Abstr. Proc. EGU Gen. Assem., European Geoscience Union, 2013.

[5] S.J.D. Cox, J. Yu, T. Rankine, SISSVoc: A Linked Data API for access to SKOS vocabularies, Semant. Web J. submitted (2014).

[6] R. Cyganiak, D. Wood, M. Lanthaler, RDF 1.1 Concepts and Abstract Syntax, (2014).

[7] L. Feigenbaum, G.T. Williams, K.G. Clark, E. Torres, SPARQL 1.1 Protocol, Cambridge, Mass. USA, 2013, Available at: http://www.w3.org/TR/sparql11-protocol/.

[8] J. Githaiga, G. Duclaux, S.J.D. Cox, J. Yu, Spatial Information Services Stack (SISS) Vocabulary Service – A Tool For Managing Earth & Environmental Sciences Controlled Vocabularies., in: eResearch Australasia, 2010.

[9] S. Harris, A. Seaborne, SPARQL 1.1 Query Language, World Wide Web Consortium, 2013, Available at: http://www.w3.org/TR/sparql11-query/.

[10] ISO/TC-211, ISO 19135:2005 - Geographic information -- Procedures for item registration, 2005, Available at: http://www.iso.org/iso/catalogue_detail.htm?csnumber=32553.

[11] A. Miles, S. Bechhofer, SKOS Simple Knowledge Organization System Reference, World Wide Web Consortium, 2009, Available at: http://www.w3.org/TR/skos-reference/.

[12] National Library of Finland, Skosmos REST API. [Online]. Available at: https://github.com/NatLibFi/Skosmos/wiki/REST-API. [Accessed: 25-Aug-2014].

[13] D. Reynolds, Linked Data Registry. [Online]. Available at: https://github.com/UKGovLD/ukl-registry-poc/wiki.

[14] L. Sauermann, R. Cyganiak, Cool URIs for the Semantic Web. [Online]. Available at: http://www.w3.org/TR/cooluris/. [Accessed: 13-Feb-2014].

[15] PURL Home Page. [Online]. Available at: https://purl.org/docs/index.html. [Accessed: 01-Sep-2014].

[16] PoolParty Manual. [Online]. Available at: https://grips.semantic-web.at/display/POOLDOKU/Introduction;jsessionid=0250F458A2F3CD92FEE6CC345962DF58. [Accessed: 28-Aug-2014].

[17] TemaTres Controlled Vocabulary server. [Online]. Available at: http://www.vocabularyserver.com/. [Accessed: 12-Sep-2014].

Page 23: Capabilities and Status of the Linked Data Registry technology · 9/12/2014  · capability of LDR for the applications under consideration. Next it considers the use of LDR for a
Page 24: Capabilities and Status of the Linked Data Registry technology · 9/12/2014  · capability of LDR for the applications under consideration. Next it considers the use of LDR for a

18 | Report Title

CONTACT US

t 1300 363 400 +61 3 9545 2176 e [email protected] w www.csiro.au

YOUR CSIRO

Australia is founding its future on science and innovation. Its national science agency, CSIRO, is a powerhouse of ideas, technologies and skills for building prosperity, growth, health and sustainability. It serves governments, industries, business and communities across the nation.

FOR FURTHER INFORMATION

Land and Water Flagship Simon J D Cox t +61 3 9252 6342 e [email protected] w www.csiro.au/Land-and-Water