3
Astron. Nachr. / AN 329, No. 3, 304 – 306 (2008) / DOI 10.1002/asna.200710938 Resource discovery with the VO registry M.J. Graham California Institute of Technology, Mail Code 158-79, Pasadena, CA 91125, USA Received 2007 Aug 31, accepted 2007 Dec 2 Published online 2008 Feb 25 Key words methods: data analysis This article describes the VO registry concept and how it can be used to find resources – data sets, services and infrastruc- ture – to support followup activities. It also discusses how VOEvent infrastructure components can be incorporated. c 2008 WILEY-VCH Verlag GmbH& Co. KGaA, Weinheim 1 Introduction VOEvent is, by its nature, a very ephemeral construct and so far most discussions involving it have concentrated on how to generate event packets. However, as we move into an era where event streams are commonplace, we need to address what to do once we have received an event. Specifically we are interested in determining what astronomical resources exist to support follow-up activities. This refers not just to locating telescopes and instruments for further observations but also to finding additional data sets and services that can be used for data mining and constructing derived data prod- ucts such as light curves and supporting infrastructure to provide facilities such as dynamic storage for data caching. Fortunately the Virtual Observatory has developed a mecha- nism to support such queries: the registry. A related issue is how the components of the VOEvent infrastructure – event publishers, repositories, etc. – are represented in the reg- istry. In this paper we will present an overview of registries, the resource descriptions that they contain and how these can be accessed. We will also address how VOEvent infras- tructure components can be registered. It is laid out as fol- lows: in Sect. 2, we describe what a registry is, how registry instances interoperate and how they can be accessed pro- grammatically. In Sect. 3, we discuss what the registry con- tents are and how these are identified and can be extended to support new resource types. In Sect. 4, we consider how to perform a constraint-based query on a registry to find re- sources. In Sect. 5, we turn our attention to how VOEvent resources can be represented in the registry. Finally, in Sect. 6, we suggest what components should be registered. 2 Registries The registry is essentially a Yellow Pages for astronomical resources. Anything that is describable and identifiable – a Corresponding author: [email protected] resource – can be stored: not just data and services but also organizations, projects, software, etc. The registry holds a list of resource descriptions which are expressed as struc- tured metadata, enabling automated processing and search- ing. 2.1 Granularity Registries offer differing levels of granularity in the meta- data they hold. Coarse-grained registries will hold high- level descriptions of organizations, archives and catalogs whereas fine-grained registries will hold low-level descrip- tions of individual table records, images and celestial ob- jects. There is also a distinction between searchable reg- istries which can be queried and publishing registries which support the dissemination of resource descriptions. Indeed resources in the VO are considered to be published if one can use VO facilities, e.g. searchable registries, to find them. Note that a registry can be both searchable and publishing. 2.2 Registry model Registries operate according to a network model where searchable registries regularly harvest new or updated re- source descriptions from publishing registries. In this way the metadata in the registry network is always current and changes quickly propagate through the system. A full searchable registry harvests from all known publishing reg- istries and should be the first stop for client applications with a generic query. Local searchable registries harvest se- lectively and aim to be specialized repositories, e.g. holding domain-specific metadata such as relating to exoplanets. If a client application has a very specific query then it might be more efficient to use an appropriate local searchable registry to address it. 2.3 Registry interfaces The International Virtual Observatory Alliance (IVOA) has defined two programmatic interfaces for registries (Benson c 2008 WILEY-VCH Verlag GmbH& Co. KGaA, Weinheim

Resource discovery with the VO registry

Embed Size (px)

Citation preview

Page 1: Resource discovery with the VO registry

Astron. Nachr. / AN 329, No. 3, 304 – 306 (2008) / DOI 10.1002/asna.200710938

Resource discovery with the VO registry

M.J. Graham�

California Institute of Technology, Mail Code 158-79, Pasadena, CA 91125, USA

Received 2007 Aug 31, accepted 2007 Dec 2Published online 2008 Feb 25

Key words methods: data analysis

This article describes the VO registry concept and how it can be used to find resources – data sets, services and infrastruc-ture – to support followup activities. It also discusses how VOEvent infrastructure components can be incorporated.

c© 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

1 Introduction

VOEvent is, by its nature, a very ephemeral construct and sofar most discussions involving it have concentrated on howto generate event packets. However, as we move into an erawhere event streams are commonplace, we need to addresswhat to do once we have received an event. Specifically weare interested in determining what astronomical resourcesexist to support follow-up activities. This refers not just tolocating telescopes and instruments for further observationsbut also to finding additional data sets and services that canbe used for data mining and constructing derived data prod-ucts such as light curves and supporting infrastructure toprovide facilities such as dynamic storage for data caching.Fortunately the Virtual Observatory has developed a mecha-nism to support such queries: the registry. A related issue ishow the components of the VOEvent infrastructure – eventpublishers, repositories, etc. – are represented in the reg-istry.

In this paper we will present an overview of registries,the resource descriptions that they contain and how thesecan be accessed. We will also address how VOEvent infras-tructure components can be registered. It is laid out as fol-lows: in Sect. 2, we describe what a registry is, how registryinstances interoperate and how they can be accessed pro-grammatically. In Sect. 3, we discuss what the registry con-tents are and how these are identified and can be extendedto support new resource types. In Sect. 4, we consider howto perform a constraint-based query on a registry to find re-sources. In Sect. 5, we turn our attention to how VOEventresources can be represented in the registry. Finally, in Sect.6, we suggest what components should be registered.

2 Registries

The registry is essentially a Yellow Pages for astronomicalresources. Anything that is describable and identifiable – a

� Corresponding author: [email protected]

resource – can be stored: not just data and services but alsoorganizations, projects, software, etc. The registry holds alist of resource descriptions which are expressed as struc-tured metadata, enabling automated processing and search-ing.

2.1 Granularity

Registries offer differing levels of granularity in the meta-data they hold. Coarse-grained registries will hold high-level descriptions of organizations, archives and catalogswhereas fine-grained registries will hold low-level descrip-tions of individual table records, images and celestial ob-jects. There is also a distinction between searchable reg-istries which can be queried and publishing registries whichsupport the dissemination of resource descriptions. Indeedresources in the VO are considered to be published if onecan use VO facilities, e.g. searchable registries, to find them.Note that a registry can be both searchable and publishing.

2.2 Registry model

Registries operate according to a network model wheresearchable registries regularly harvest new or updated re-source descriptions from publishing registries. In this waythe metadata in the registry network is always currentand changes quickly propagate through the system. A fullsearchable registry harvests from all known publishing reg-istries and should be the first stop for client applicationswith a generic query. Local searchable registries harvest se-lectively and aim to be specialized repositories, e.g. holdingdomain-specific metadata such as relating to exoplanets. If aclient application has a very specific query then it might bemore efficient to use an appropriate local searchable registryto address it.

2.3 Registry interfaces

The International Virtual Observatory Alliance (IVOA) hasdefined two programmatic interfaces for registries (Benson

c© 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

Page 2: Resource discovery with the VO registry

Astron. Nachr. / AN (2008) 305

Table 1 Operations supported by the registry harvesting inter-face.

Operation Description

Identify Returns the resource describing theregistry itself

ListIdentifiers Lists the identifiers of resources thathave changed since a given date

ListRecords Lists the full descriptions of re-sources that have changed since agiven date

GetRecord Returns a single resource identifiedby its identifier

ListMetadataFormats Lists the available output descriptionformats

ListSets Lists the categories of resources thatcan be requested

Table 2 Operations supported by the registry search interface.

Operation Description

Search Returns resources that match a specificset of constraints

KeywordSearch Returns active resources containing spec-ified keywords

GetResource Returns a single resource identified by itsunique IVOA identifier

GetIdentity Returns the resource describing thesearchable registry itself

XQuerySearch Optional support for XQuery-basedsearching

et al. 2006): one to support harvesting and the other query-ing. The harvesting interface is designed to be used by reg-istries to exchange resource descriptions and is defined asa profile on the Open Archive Initiative (OAI) harvestingstandard (Lagoze et al. 2004) used in the digital library com-munity. The search interface is intended to be the main wayby which client applications discover appropriate resources.Both interfaces are SOAP-based web services and descrip-tions of the operations that they support are given in Tables1 and 2 respectively.

3 Resource metadata

The structured metadata for a resource is expressed in termsof an XML document.

3.1 Data model and its representation

The data model that underpins the resource descriptionsconsists of a set of resource classes. The base class is thegeneric Resource with derived classes to represent func-tional extensions such as Organisation, DataCollection,Service and Registry. Some of these may have further

derivatives, e.g Service can be extended to DataService andCatalogService.

The data model is formally represented as a set of XMLschemata. The base schema is the VOResource schemawhich describes the core resource metadata used by all de-scriptions. This addresses such concepts as identity infor-mation, curation information – who is responsible for thisresource and its description – and content information –what does this resource contain. VOResource also definesthe Resource, Service and Organisation base types. Thereare corresponding extension schemata to cover the derivedclasses so, for example, the VODataService schema coversDataCollection and DataService types.

Together this set of schemata is capable of representingthe full scope of astronomical resources from data centersand missions through data collections and archives to dataaccess and general web services.

3.2 Capabilities

The Service resource extends the core metadata by addingcapability metadata. A capability is the set of the inter-faces, protocols and behaviour supported by the service.Each standard VO protocol is considered a different capa-bility, e.g. Cone Search (CS; Williams et al. 2007) or Sim-ple Image Access (SIA; Tody & Plante 2004). Obviously aservice can support multiple capabilities, e.g. a DataServicecould have both a CS and a SIA capability.

The capability metadata describes characteristics suchas the maximum search radius or number of results that canbe returned in a single call. It also includes a description ofthe service interface: for example, the SIA capability woulddescribe an HTTP GET interface specifying the endpointURL to use to access it and what the data format of the re-sponse would be. There can be multiple interfaces definedwithin a single capability and this allows for different ver-sions of protocol standards, e.g. a SIA capability could haveboth a first- and second-generation interface.

3.3 Identifiers

Every resource in the VO has a globally unique identifier –the IVOA identifier or IVORN – which can always be re-solved in a full searchable registry. An example IVORN is

ivo://nvo.caltech/VOEvent

which identifies the VOEvent publisher at Caltech. TheIVORN is composed of two components: the Authority ID –nvo.caltech – which defines a namespace for identifiers andis associated with a single publishing organisation; and theResource Key – VOEvent – which is the unique name forthe resource within the namespace.

4 Constraint-based discovery

The Search operation supported by the registry search in-terface allows constraint-based discovery of resources. The

www.an-journal.org c© 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

Page 3: Resource discovery with the VO registry

306 M.J. Graham: Resource discovery with the VO registry

request is specified in terms of the Astronomical Data QueryLanguage (ADQL; Ohishi et al. 2005) “where” clause.ADQL is essentially a subset of SQL92 with support forspatial regions.

There are three classes of constraint possible:

1. values of specific metadata, e.g.<Where> title like ‘%Deep Field%’ or short-Name = ’HDF’ </Where>;

2. xsi:type: the type of a resource or a capability is deter-mined by the value of this attribute in the respective el-ement, e.g.

<Where> capability/@xsi:type like ’%Sim-pleImageAccess’ </Where>;

3. coverage: coverage information – spatial, temporal andspectral – is provided for certain types of resource andexpressed in STC. STC (Space-Time Coordinate; Rots2007) is the IVOA XML standard for representing co-ordinate information. Unfortunately ADQL and STC donot interact well and this currently limits this function-ality but the emerging IVOA Footprint Services will fa-cilitate coverage-type queries.

5 VOEvent infrastructure

There is currently no extant extension schema to cover VO-Event infrastructure which means that at the moment VO-Event components such as publishers and repositories mustbe registered as generic Resource or Service types. How-ever, a VOEvent extension schema is under developmentand this will introduce two new capabilities – VOEventPub-lisher and VOEventRepository – to correctly support VO-Event components in the registry.

The VOEventPublisher capability will specify whatkinds of events are published, e.g. GCNs, microlensingalerts, etc. and the default coordinate system used in theevent packets, i.e. the value of the STC xlink attribute. Itwill also list the different publishing interfaces that the pub-lisher supports, e.g. Jabber, TCP/V and RSS. Each interfacewill also describe its own set of metadata so a Jabber in-terface would specify the server name, the feed name, andthe login name and password requirements whilst a TCP/Vinterface would just give the server name and port numberand whether authentication was in use. One issue that stillneeds to be resolved is whether all event streams from apublisher are served by the same protocol: for example, aproprietary event stream might be served by TCP/V and amore public stream by RSS (although it could just easilyalso be served by TCP/V over a different port). This mightrequire including a list of supported event streams in the in-terface descriptions.

The VOEventRepository capability will specify whoseevents are being stored and the level of persistence, i.e. howlong are the events archived. It will also list the search in-terfaces that the repository supports such as web form, webservice or Simple Event Access Protocol (SEAP; Graham,Auden & Warner 2007). SEAP is intended to be standard

search interface for event repositories and will allow resolu-tion of event packets by their IVORN, finding citing pack-ets and various forms of constraint-based querying - spatial,temporal, parametric and conceptual, depending on whichsection of the VOEvent packet is being queried. It is still un-clear whether repositories will harvest from each other akinto registries or indeed whether one will be able to subscribeto a repository as a source of event packets. Obviously eitherof these possibilities would introduce further metadata to bespecified in the capability. Lastly there is a similar consider-ation to the publishing capability regarding different searchinterfaces for proprietary and public events.

6 Registration prescription

It is recommended that all VOEvent setups be registeredwith a VO registry and this will require a number of dif-ferent resources to be identified. Firstly the overall projector group who are producing the events should be regis-tered as an Organisation resource. Hand-in-hand with thisgoes registering the Authority resource that will be usedto identify all resources connected to this setup. For ex-ample, the Caltech VOEvent setup is managed by CACR(ivo://nvo.caltech/CACR) which identifies all its resourceswith the ivo://nvo.caltech Authority ID.

If the setup is a producer of VOEvents then the pub-lisher will need to be registered as Service or DataSer-vice resource. Eventually this would include a VOEvent-Publisher capability. The IVORN of the publisher should bethe same as is used in the VOEvent packets that the pub-lisher is producing. For example, Caltech VOEvents haveIVORNs of the form ivo://nvo.caltech/VOEvent#... and soivo://nvo.caltech/VOEvent would be used to identify theregistry resource describing the publisher software.

If the setup includes a repository then this should beregistered as a CatalogService. This will eventually also in-clude a Repository capability.

References

Benson, K., Auden, E., Graham, M., et al.: 2006, http://www.ivoa.net/Documents/latest/RegistryInterface.html

Graham, M., Auden, E., Warner, P.: 2007, http://wiki.astrogrid.org/bin/view/Astrogrid/SimpleEventAccessProtocol

Lagoze, C., Van de Sompel, H., Nelson, M., Warner, S.: 2004,http://www.openarchives.org/OAI/openarchivesprotocol.html

Ohishi, M., Szalay, A., IVOA VOQL Working Group: 2005,http://www.ivoa.net/Documents/latest/ADQL.html

Rots, A.: 2007, http://www.ivoa.net/Documents/latest/STC.htmlTody D., Plante, R.: 2004, http://www.ivoa.net/Documents/latest/

SIA.htmlWilliams, R., Hanisch, R., Szalay, A., Plante, R.: 2007,

http://www.ivoa.net/Documents/latest/ConeSearch.html

c© 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.an-journal.org