22
CEDEM 2012, May 3-4 The necessity of metadata for linked open data and its contribution to policy analyses Anneke Zuiderwijk*, Keith Jeffery**, Marijn Janssen* *Delft University of Technology, The Netherlands **Science and Technology Facilities Council, United Kingdom

The necessity of metadata for linked open data and its contribution to policy analyses #CeDEM12

Embed Size (px)

DESCRIPTION

The necessity of metadata for open linked data and its contribution to policyanalyses (Anneke Zuiderwijk, Keith Jeffery, Marijn Janssen) #CeDEM12

Citation preview

Page 1: The necessity of metadata for linked open data and its contribution to policy analyses #CeDEM12

CEDEM 2012, May 3-4

The necessity of metadata for linked open data and its contribution to policy analyses

Anneke Zuiderwijk*, Keith Jeffery**, Marijn Janssen*

*Delft University of Technology, The Netherlands**Science and Technology Facilities Council, United Kingdom

Page 2: The necessity of metadata for linked open data and its contribution to policy analyses #CeDEM12

Open governmental data

0 "We are sending a strong signal to administrations today. Your data is worth more if you give it away. So start releasing it now.” (December 12, 2011)

European Commission Vice President Neelie Kroes, digital agenda: Turning government data into gold)

0 One of many examples that shows that open governmental data have gained considerable attention recently

CEDEM 2012

Page 3: The necessity of metadata for linked open data and its contribution to policy analyses #CeDEM12

The ENGAGE project

0 ENGAGE (FP7): An Infrastructure for Open, Linked Governmental Data Provision towards Research Communities and Citizens (http://www.engage-project.eu)

0 Main goal: the development and use of a data infrastructure, incorporating distributed and diverse public sector information (PSI) resources.

0 The ENGAGE platform will enable researchers and citizens to:0 Discover and browse datasets across diverse and dispersed public

sector information resources (local, national and European) in their own language

0 Download the datasets0 Perform geospatial search of datasets0 Visualize properly structured datasets in data tables, maps and charts

CEDEM 2012

Page 4: The necessity of metadata for linked open data and its contribution to policy analyses #CeDEM12

Open governmental data

0 Open governmental data can be defined as “all stored data of the public sector which could be made accessible by government in the public interest without any restrictions on usage and distribution” (Geiger & Von Lucke, 2011, p. 185).

0 For example, public sector data can be:0 Geographic data (e.g. cadastral information)0 Legal data (e.g. courts decisions, legislation)0 Meteorological data (e.g. climate data, weather forecasts)0 Social data (e.g. population, public administration)0 Transport data (e.g. traffic congestion, work on roads)0 Business data (e.g. chamber of commerce, patents) (MEPSIR study,

Dekkers et al., 2006)

CEDEM 2012

Page 5: The necessity of metadata for linked open data and its contribution to policy analyses #CeDEM12

Figure 1: Process for creating Linked Open Data

PUBLICATION ON THE SEMANTIC WEB

PUBLIC SECTOR (POLICY)

DATA

LINKED OPEN DATA

METADATA

LINKING DATA

REUSING OPEN DATA

(1)

(2)

(3)

(4)

(5)

Linked open data (LOD)

0 Focus on turning public sector data into LOD

1. Public body produces data (and metadata)

2. Data become available on the Web of Data / Semantic Web

3. Open data can be reused4. Open data can be linked to other

data show relationships5. Data are both open and linked

Linked Open Data (LOD)

CEDEM 2012

Page 6: The necessity of metadata for linked open data and its contribution to policy analyses #CeDEM12

Metadata

0 Metadata are part of the LOD-process0 Metadata are needed to make sense of the open data (Berners-

Lee, 2009)

0 Metadata are defined as “structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource.” (National Information Standards Organization, 2004, p. 1).

0 Metadata provision in the ideal situation:0 Discovery metadata, e.g. identifier, title, creator, keywords.0 Contextual metadata, e.g. organizations, projects, funding.0 Detailed metadata, e.g. quality and domain specific parameters.

CEDEM 2012

Page 7: The necessity of metadata for linked open data and its contribution to policy analyses #CeDEM12

Why metadata are necessary in analyzing LOD

0Metadata for LOD can be useful in the following situations. Metadata:0 create order within datasets;0 improve storing and preservation of LOD;0 improve easily finding LOD;0 improve the accessibility of LOD;0 may make it possible to assess and rank the quality of LOD;0 improve easily analyzing, comparing, reproducing and therefore finding

inconsistencies in LOD;0 improve chances of a correct interpretation of LOD;0 improve the possibilities to find patterns in LOD to generate new

hypotheses;0 may improve visualizing LOD;0 make it easier to link data ;0 avoid unnecessary duplication of LOD.

CEDEM 2012

Page 8: The necessity of metadata for linked open data and its contribution to policy analyses #CeDEM12

Problem statement

0 Discrepancies between the benefits that are described in literature and the benefits that are obtained in reality

0 Current situation is a long way from the ideal situation:0 usually few and insufficient ways of managing metadata and

interpretation of LOD (for instance Hernández-Pérez et al., 2009; Schuurman et al., 2008; Xiong et al., 2011);

0 adding metadata is often viewed as an additional activity that only consumes resources.

0 Statements:0 Merely linking data is not enough to make use of open data 0 Metadata are key enablers for the effective use of LOD in

policy-making

CEDEM 2012

Page 9: The necessity of metadata for linked open data and its contribution to policy analyses #CeDEM12

Requirements for a metadata architecture

0 The metadata should:0 be easily discovered;0 interconvert common metadata formats used in PSI;0 provide a LOD representation of the metadata for browsing

or query;0 maintain the capabilities of conventional information

systems with structured query including convenient primitive operations.

CEDEM 2012

Page 10: The necessity of metadata for linked open data and its contribution to policy analyses #CeDEM12

Outline architecture0 The requirements lead to the following architecture:

CEDEM 2012

Figure 2: An architecture of a portal server for the provision of metadata.

PSI Dataset Servers

Application Server

Portal server PORTAL METADATA

RUNNING SOFTWARE APPLICATION

PSI DATA-SET

PSI DATA-SET

PSI DATA-SET

Page 11: The necessity of metadata for linked open data and its contribution to policy analyses #CeDEM12

Metadata0 Metadata should be used to implement this architecture

A 3-layer structure for metadata is used: a) discovery (flat) metadata; for example:

0 Dublin Core (DC);0 e-Government Metadata Standard (e-GMS);0 Comprehensive Knowledge Archive Network (CKAN);0 or similar ‘flat’ metadata

b) contextual metadata; uses the Common European Research Information Format (CERIF) ;

c) detailed metadata.

CEDEM 2012

Page 12: The necessity of metadata for linked open data and its contribution to policy analyses #CeDEM12

The Vision: Metadata for Data Model

DISCOVERY(DC, eGMS…)

CONTEXT(CERIF)

DETAIL(SUBJECT OR TOPIC SPECIFIC)

Generate

Point to

Linked open data

Formal Information

Systems

Page 13: The necessity of metadata for linked open data and its contribution to policy analyses #CeDEM12

DesignThe presented structure provides the next improved facilities:

0 CERIF provides a much richer metadata than the standards used commonly with PSI datasets.

0 The representation of contextual metadata (CERIF) allows rich semantics to be represented thus making the PSI datasets understandable to the end user (or software) through the metadata.

0 The Structured Query Language (SQL) has a simpler structure than SPARQL and includes convenient primitive operations for simple statistical calculations such as sum, count, average.

CEDEM 2012

Page 14: The necessity of metadata for linked open data and its contribution to policy analyses #CeDEM12

Benefits of architecture

0 Because of the powerful expressive semantics over formal syntax of CERIF we can:0 Generate discovery metadata from CERIF;0 Interconvert common metadata formats used in PSI using CERIF as the

superset exchange mechanism;0 Provide a semantic web / LOD representation of the metadata for

browsing or query using SPARQL;0 While maintaining a conventional information systems capability with

structured query including convenient primitive operations.

CEDEM 2012

Page 15: The necessity of metadata for linked open data and its contribution to policy analyses #CeDEM12

Models for an infrastructure

0 The data model with its metadata described is only one relevant model

0 The other models are:0 User model0 Processing model0 Resource model

Page 16: The necessity of metadata for linked open data and its contribution to policy analyses #CeDEM12

The Vision: The Models

Complete ICT environment for PSIComplete cohort of users

Processing Model

User Model

Data ModelResource

Model

Page 17: The necessity of metadata for linked open data and its contribution to policy analyses #CeDEM12

Model – User model

0 User Model: controls the way in which the end-user interacts with the e-infrastructure.0 User profile, security certification, privacy;0 Device and interaction mode preferences (keyboard/mouse through

voice and gesture to brain-connected), language preference;0 Resource preferences (including contacts) with directories;

0 METADATA

Page 18: The necessity of metadata for linked open data and its contribution to policy analyses #CeDEM12

Models – Processing model

0 Process Model controls the way processes are constructed and executed in the e-infrastructure0 Services

0 Described for discovery, described for functional and non-functional (security, privacy, performance) properties

0 Mobile (deployed in distributed / parallel execution environments)

0 Open source where possible

0 Service composition0 Dynamically (re-) composable during execution

0 METADATA

Page 19: The necessity of metadata for linked open data and its contribution to policy analyses #CeDEM12

Models – Data model

0 Data Model controls data representation and data (re-)use0 Formal syntax (structure)

0 Even for text, images, streamed video0 Declared semantics (meaning)

0 METADATA

Page 20: The necessity of metadata for linked open data and its contribution to policy analyses #CeDEM12

Models – Resource model

0 Resource Model catalogs the available computing resources in the e-infrastructure0 This allows virtualisation so the user neither knows nor cares from

where the data comes, or where the processing is done, as long as quality of service is maintained;

0 Requires updating by resource owners – together with conditions of use

0 METADATA

Page 21: The necessity of metadata for linked open data and its contribution to policy analyses #CeDEM12

Conclusions (1)

0 Metadata are needed to make sense of the open data 0 Merely linking data is not enough to make optimal use of open

data 0 Metadata are key enablers for policy-making0 Adding metadata can yield considerable benefits, including:

0 creating order in datasets0 improving find ability, accessibility, storing and preservation of LOD0 improving easily analyzing, comparing, reproducing, finding

inconsistencies0 correct interpretation and visualizing of LOD0 finding patters in LOD to generate new hypotheses0 making linking of data easier0 assessing and ranking the quality of LOD and avoiding unnecessary

duplication of LOD

CEDEM 2012

Page 22: The necessity of metadata for linked open data and its contribution to policy analyses #CeDEM12

Conclusions (2)

0 Architecture for metadata:0 discovery metadata can be generated from CERIF0 common metadata formats can use CERIF as the superset exchange

mechanism0 a LOD representation of the metadata for browsing or query can be

made allowing the use of SPARQL0 while a conventional information systems capability with structured

query including convenient primitive operations can be maintained0 We recommend to further implement the proposed metadata

architecture

CEDEM 2012