46
University of Groningen Industrial Engineering and Management Bachlor Thesis Supervisors: prof. dr. H.G. Sol (University of Groningen), drs. Ir. T.A. van den Broek (TNO) Open Data: a design for the provisioning of Dutch government public and geo-spatial transport data. (FINAL DRAFT) Groningen, January 31, 2011

Final Draft BA Scriptie

Embed Size (px)

DESCRIPTION

Laatste draft versie van mijn ba scriptie

Citation preview

Page 1: Final Draft BA Scriptie

University of Groningen

Industrial Engineering and Management

Bachlor Thesis

Supervisors: prof. dr. H.G. Sol (University of Groningen),

drs. Ir. T.A. van den Broek (TNO)

Open Data: a design for the provisioning of

Dutch government public and geo-spatial

transport data. (FINAL DRAFT)

Groningen, January 31, 2011

Page 2: Final Draft BA Scriptie

Abstract

Governments recently started publishing structured, machine readable and free public sec-

tor information for commercial and public re-use. They are moving from a closed model

in which businesses pay a cost that maximizes government profit towards a free model in

which data is freely available without any cost. This form of public sector information pro-

visioning is also referred to as open data. In this paper a design and business model and

data warehouse for Dutch public and geo-spatial data. Furthermore, the implications of a

governmental open data policy on the business case of various stakeholders that work with

public- and geospatial transport data is examined.

To establish the theoretical underpinnings that have lead to open data policies a literature

review and interviews with specialists were conducted. We found that the proliferation of

the internet as a new participatory and economic platform, the development of freedom of

information and transparency policies and the economic benefits of free public sector infor-

mation, have contributed to the development of open data. We found that if government

data were to be made available at zero or marginal cost this could lead to significant in-

creases in economic activity. Businesses could use the different data sets to create services

and therefore add value to the data. This economic activity in its turn would lead to more

revenue for the businesses and increase overall welfare. The government would benefit from

this activity through taxation of the services.

A business model of open data in the public and geo-spatial transport sector was designed.

In this model barriers in legislation were removed, accurate pricing strategies and a technical

implementation for open data are recommended. We found that this model causes changes in

the business case of data providing organizations and businesses. Especially the cost struc-

ture of these respective stakeholder should be changed. Finally, a design for a data warehouse

for road and public transport data is presented. The design covers a warehouse architec-

ture, data model, interface design, hardware recommendations and qualitative aspects. In

the final section of the paper we discuss some of the findings in relation to economic activity,

loss of intellectual property, licensing of open data and changes in government cost-structure.

Keywords: public sector information, open data, design, business case, data-warehouse,

public transport, geo-data, economics, transparency, governments

Open Data: a design for the provisioning of Dutch government public and geo-spatial trans-

port data. by J.P.S. van Grieken is licensed under a Creative Commons Attribution -Non

Commercial -Share Alike 3.0 Unported License.

Page 3: Final Draft BA Scriptie

Contents

1 Introduction 3

1.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.1.1 The Networked Society . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.1.2 The move to transparency . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2 Open Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.3 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.4 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Theory 8

2.1 The economics of open data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 Dutch government information architecture . . . . . . . . . . . . . . . . . . . 11

2.3 Stakeholders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.4 The business model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.5 Research Question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3 Methods 16

3.1 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.2 Open Interviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.3 Stakeholder Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.4 Structured interviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.5 Business case analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.6 Requirements analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.7 Data Warehouse design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4 Business Model Design 19

4.1 Effects of the model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.2 Effects on the stakeholder business cases . . . . . . . . . . . . . . . . . . . . . 21

5 Technology Design 23

5.1 Landscape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

5.2 Warehouse Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

5.3 Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

5.4 Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5.5 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

1

Page 4: Final Draft BA Scriptie

CONTENTS

5.6 Qualitative Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

6 Discussion 31

6.1 Effects on businesses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

6.2 Changes in government cost structures . . . . . . . . . . . . . . . . . . . . . . 31

6.3 Loss of intellectual property and market disturbance . . . . . . . . . . . . . . 32

6.4 Legal: insuring coverage, quality, privacy and neutrality of data . . . . . . . . 32

6.5 Data vs. Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

6.6 Risks of the design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

7 Appendix 38

.1 Requirements Document . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

.2 Interview Schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

.3 Final Presentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

.4 List of Interviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

.5 Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

2

Page 5: Final Draft BA Scriptie

Chapter 1

Introduction

”Political participation, civil society, and transparency are among the indispens-

able elements that are the imperatives of democratization.” As quoted from a

speech at Harvard University, Kennedy School of Government by Recep Tayyip

Erdogan , January 30th 2003

Long before the rise of computer technology governments have started to collected vast

amounts of structured data. Already in 1811 the cadastre started measuring and recording

the ownership of land1. And in 1899 the Central Bureau for Statistics (CBS) kept detailed

records and statistics on the Dutch population in order allow decision makers to construct

effective economic policies. Most of this data is used by different governmental organizations

to serve the public in their daily operations. For example, the cadastre uses the detailed maps

they have gathered to determine the boundaries of land when sold. Nowadays, this struc-

tured data is stored in large data warehouses owned and maintained by different branches

of government. Estimates suggest that between 100-150 Dutch governmental organizations

posses data that could be relevant to the public or to businesses [1].

If this government data were to be made available at zero or marginal cost this could

lead to significant increases in economic activity. Businesses could use the different data sets

to create services and therefore add value to the data. This economic activity in its turn

would lead to more revenue for the businesses and increase overall welfare. The government

would benefit from this activity through taxation of the services.

There are three main reasons that this business potential remains untapped in the Nether-

lands. First of all, governments often choose a pricing strategy that either maximizes profit

or returns the long-term average cost. This causes a barrier for businesses to re-use the data

because the cost to gather the information themselves is similar to buying it directly from

the government. Secondly, law and policy restrictions apply to most of the datasets the

government owns. For example, copyright and database law restrictions limit businesses in

the services that could possibly be build on this data. Finally, most government bodies lack

the technical infrastructure to deliver high quality data to businesses at high speed.

3

Page 6: Final Draft BA Scriptie

CHAPTER 1. INTRODUCTION

1.1 Context

Before we begin the analysis of the economics and technical infrastructure needed for our

design we first want to explain the developments in legislation and society that have lead to

open data.

1.1.1 The Networked Society

The first important development that has made open data possible is the rise of internet

within our society. The internet has created a market for information services and goods. It

has created possibilities for collaboration and trade of information goods and services and

is developing as a major distribution platform for these services.

Everywhere around the globe broadband access has been pushed into markets to con-

nect people to the internet. Since a couple of years almost everybody in the Netherlands

has access to the internet via a computer or mobile device. The access to the internet has

risen from 77% in 2004 to 93% in 2009 [3]. These new forms of communication have enabled

citizens to communicate in new ways amongst themselves and with public institutions. Net-

works of people continue to form the structures and organization of society, a phenomenon

which is mainly referred to as the rise of the network society [4]. These ways of interaction

create new ways of collaboration among citizens in terms of speed, scale, anonymity, inter-

activity and community building. The internet provides a market for people to collaborate

and is described by Antonijevic and Gurak as

”[The internet] has brought easy to use content-creating applications such as

blogs, wikis, social networking sites, and file sharing platforms rooted in broad-

band access, affordable hardware and software solutions, and with the Internet

perceived and used as a new normal in contemporary way of life.” [5].

The development of the internet as a network of individuals collaborating is reckoned

as a new way of creating economic value. The OECD sees the web as one of the drivers

for creativity and economic development among people in the coming century [6]. In the

field of software construction this has lead to the collaborative software creation between

programmers and other specialist from all over the globe, which is referred to as open source

software. Open Source software challenges the rules of economics, software development and

IT management. On development networks like sourgeforge.net, vast amounts of program-

mers work together on software projects without any financial compensation[7].

These programmers engage in civil society and organize ’bar camps’2 and online platforms

where they meet and try to construct software that helps governments and citizens in their

daily lives. A good example of a developed network is the Sunlight Labs in the United States

which counts around 2700 volunteering programmers 3 that work on various projects. In

Europe a large community of programmers can be found in the United Kingdom, Denmark

and Spain.

A study in the United Kingdom looked at the motivation of these communities of pro-

grammers in relation to open data. Citizens showed a desire to engage with government in

4

Page 7: Final Draft BA Scriptie

CHAPTER 1. INTRODUCTION

open data initiatives. The survey indicated that 36% wanted to be actively involved and

use, vs. 33% that were ’just happy to get the data’. Similar effects have been found in

the relation between citizens and the government in the Netherlands[8]. A study by TNO

suggests that the rise of the social web (web 2.0) causes citizens to create new platforms

that they use to organize, collaborate, share, trade and create [10]. These platforms are

open in nature, require visitors to collaborate and try to use the distributed knowledge of

all the participants. We have now described the implications that give open data is societal

context. The networked society has lead to a collaboration platform and potential market

for open data.

1.1.2 The move to transparency

In most countries that have adopted open data policies the development originated from

transparency and freedom of information laws. The term transparency has many different

definitions depending on specific use and context. In the field of politics and government

transparency is usually referred to as ’social transparency’. This form of transparency is

defined as ” Social Transparency allows citizens to be more informed and encourages the

disclosure as a regulation mechanism of centers of authority. It is based on ethics and gov-

ernance, where the interests and needs are focused in the citizens” [11]. Governments use

Freedom of Information (FOI) laws to define the formal rights and degrees of freedom of

transparency within a nation. The first freedom of information laws came into effect after

the second world war, but in most countries these types of laws are still in development. A

study on freedom of information laws found that in 1985 only 11 country’s adopted free-

dom of information laws, but in 2004 almost 59 countries had some form of transparency

law passed through parliament[12]. Transparency and the right to obtain government in-

formation are seen as essential to corruption prevention, democratic participation, trust in

government, accountability, informed decision making, and provisioning of information to

the public. [13]. As a tool, the internet allows for easy publishing and rapid sharing of public

sector information in relation to Freedom of Information rights. The internet has caused

more transparent public sector organizations that are able to respond to citizen needs more

rapidly[15].

The United States have a rich history of freedom of information and transparency policies[16].

They experimented in 1997 with one of the first government transparency websites called

Fedstats.com. This website provides statistics on all the federal government agencies and

publishes it on a website. Furthermore, in the last 20 years various transparency laws have

been approved by the senate. In 2006 the Federal Funding and Transparency Act was

adopted providing high degrees of budget transparency. A year later the Honest Leadership

and Open Government Act followed and provided accountability and openness to citizens.

The final chapter in freedom of information laws in the United States was the Memorandum

on Transparency and Open Government4. In this memorandum the Obama administration

calls all federal agencies for an unpresidented level of openness. The memorandum declares

that all departments should be transparent, participatory and collaborative. With this

memorandum the administration promotes accountability, public engagement, public par-

5

Page 8: Final Draft BA Scriptie

CHAPTER 1. INTRODUCTION

ticipation and crowdsourcing using internet technology. The most important development

is that the United States government considered all data gathered to be ’national public

asset’ and should therefore be available to all citizens in a structured format.

In Europe similar policies have been adopted in the United Kingdom, Norway, Spain, Den-

mark, Estonia and Greece5. Although most of the initiatives are still in a development phase,

some similarities can be pointed out. The Danish government launched an open government

strategy which contained public sector information provisioning called ’Offentlige Data I

Spil’ aimed at providing a portal website that provides structured data to citizens. Similar

data portals have been constructed in the United Kingdom6, the Catalan region of Spain

(Aporta)7 and Norway8. In terms of policy some developments at the level of the European

Committee can be pointed out. The first import piece of legislation on the use of public

sector information is 2003 directive 98/EC on the re-use of public sector information9. This

treaty describes the development of a European data products market based on public sector

information. The main goal of this treaty is to make available, where possible, documents

that will be re-usable for commercial and non-commercial purposes where possible through

electronic means. The member states are allowed to charge for the cost of collection, pro-

duction, reproduction and dissemination together with a reasonable return on investment.

Some European studies have been carried out on the effects of public sector information.

The Commercial exploitation of Europe’s public sector information report issued by the

European Committee estimates the total value of the public sector information in Europe

between EUR 28 billion per annum and EUR 134 billion per annum, with a central estimate

of EUR 68 billion[17]. The last relevant European development was the eUnion program

that ran under Swedish presidency of the European Union. In the Visby declaration10 the

European member states call for ”EU member states and community institutions should

seek to make data freely accessible in open machine-readable formats, for the benefit of

entrepreneurship, research and transparency”. This declaration has as of now not yet been

put into legislation.

Although the Netherlands scores high on the digital e-readiness ranking[18] there is no clear

open government program as can be found in other European member states. An open

government study found that the Dutch government lacks leadership, central coordination,

focus, has trouble distinguishing open data and participation and is weary of the business

case of open government[?]. The Dutch government has been experimenting with participa-

tion subsidies and has supported some pilots in the field of open data. In terms of legislation

no far reaching freedom of information laws have been adopted by the government. Copy-

right, Freedom of Information and database laws still prohibit the distribution of open data

by central government. Also, no policy programs promoting open government or open data

have been announced. The government is however conducting some research into the pos-

sibilities of open data in the Netherlands. In order to successfully implement open data

within a country a culture of freedom of information supported by legislation is required.

6

Page 9: Final Draft BA Scriptie

CHAPTER 1. INTRODUCTION

1.2 Open Data

Before we can elaborate problem definition we need a consistent definition of open data

. Open Data is defined as the publishing of structured, free, and machine readable public

sector information [2] Where public sector information (PSI) is information gathered by

governmental bodies and stored in some structured form. Open Data should not be confused

with open source or open standard which are software and digital communication protocols

respectively.

1.3 Problem Definition

In this section we will state the societal problem that underlies our research question. The

data governments collect in their daily operations represent an economic value, and therefore

economic potential. This economic value currently remains untapped in the Netherlands.

Therefore, the problem definition for this study is:

The business potential of open government data in the Netherlands remains untapped

which causes loss of economic activity.

1.4 Objective

The objective of this study is to create a design for the provisioning of open public and

geo-spatial transport data. This study has been conducted in a period of three months and

is be part of a larger study into the cost - benefit relations of open data at the Netherlands

Organization for Applied Scientific Research (TNO). The study also serves as the bachelor

thesis Industrial Engineering & Management of mr. J.P.S. van Grieken at the University of

Groningen.

Before we start with the design we need to establish the basic premises of our problem

definition: open government data causes economic activity. When we proved this we first

need to find the main causes of our problem definition. When we found those causes we will

then create a design that includes both the societal problem and a technical implementation.

For scoping purposes we will be looking at two types of data: public and geo-spatial transport

data. We chose these data types because of their market popularity in foreign open data

initiatives.

7

Page 10: Final Draft BA Scriptie

Chapter 2

Theory

In this chapter we use theory try to identify the causes of our problem. We will start with

an elaboration of the economic case for open data. Then we will briefly introduce Dutch

government information architecture, and describe how this acts as a barrier for open data.

After that we will describe the business model of open data. This will result in elaboration

and justification of the research question.

2.1 The economics of open data

The main premises of this study is that open data causes a positive economic effect. This

chapter elaborates on the economic literature available on open data.

In their daily operation governments collect data in order to perform their primary tasks

such as determination of land ownership or running a public bus services. The data col-

lected represents both an economic value and an investment value. The investment value of

this data is what governments pay in order to collect, maintain and distribute data. The

second economic value of this data represents the part of the national income which can be

attributed to business that create services using the data, or combine it with other data in

order to add value. Studies performed by the European Committee suggest that the total

economic value lies between e28 billion per annum and e134 billion per annum, with a

central estimate of e68 billion. In 2000 the total investment of European member states in

public sector information was valued at e9.5bn[17].

Usually, public services that have been paid for by taxpayers can only be used once. The na-

ture of information and data however provides the option for it to be copied and distributed

at nearly no extra cost.[19]. When governments decide to publish free and machine readable

data value can be created in the market in the same way. Businesses reusing public sector

information do not need to gather the data themselves which lowers the investment and

time to market. Furthermore, innovative company’s will use data previously not available

to create new services. Other economic effects of open data can be found within government

itself. Research has shown that these forms of openness reduces corruption[20] which in the

end leads to a more transparent and efficient government [13]. But these specific effects our

out of scope for this paper.

8

Page 11: Final Draft BA Scriptie

CHAPTER 2. THEORY

Before we go into the details of the economic effects of open data we can describe the value

chain of information products in order to analyze the business case[17]. The value chain for

information products starts with the creation or collection of various forms of data. After

this process the data needs to be collected and stored in a form that allows for structured

retrieval. The next step is processing and packaging which allows for delivery of the data.

This final delivery process is used to bring the data at the client or end-user in a form defined

by the processing and packaging stage.

Figure 2.1: The data value chain

We will now give an example of how this value chain applies to the area’s we have se-

lected. The Dutch railway network operator Pro-rail embedded sensors in rail network that

can pinpoint the location of trains (creation). This data is collected and together with other

meta data stored into a database (collection & storage). The train operators in the Nether-

lands require this data to be able to adjust train schedules. Pro-rail therefore packages the

data in such a way that the operators can use it to adjust their planning and communicate

with travelers about delays (processing & packaging). Pro-rail uses a computer interface to

deliver this data to the different train operators in the country (delivery). The data that

has been delivered to the train operators represents value because it allows the operators to

utilize their material in a more optimal way and provide service to their customers. In the

case of open data, governments will deliver the processed and packaged data at no cost to

businesses and the public.

Different costing methods have been proposed for public sector information in order to

maximize the return of investment for governments. The return governments can get on

public sector information is a trade off between charging directly for the data, or provid-

ing the data at marginal or no cost at all. In the later case the return on investment is

achieved thought regular taxation on the economic activities that businesses perform with

the data. Pollock describes three possible pricing policies governments could use for public

sector information distribution and investigates it’s returns[21]. In a profit-maximization

strategy governments set their prices to maximize the profit given the demand for the data.

An average-cost or cost-recovery strategy can be used to equal the price to the total cost of

data collection and distribution. In this case the users of the data pay for the entire value

chain of the data. The final policy is the marginal or zero cost strategy in which the prices

are equal to the short-term marginal cost. In many cases these cost will be zero because

agencies that have already created distribution channels for the data to other government

bodies will not have to charge for delivery of data the market. For example, the cadas-

tre already distributes geo-spatial data to local authorities and therefore should not charge

9

Page 12: Final Draft BA Scriptie

CHAPTER 2. THEORY

businesses to use this delivery infrastructure. In the Netherlands depending on the specific

government organization different pricing strategies are used. The most dominant strategies

are profit maximization or average-cost policies.

Several studies have shown that the case for a marginal or zero cost policy is strong.

A study on the economic effects of statistical data approaches the problem from economic

theory angle. The study reasons that economic efficiency is maximized when services that

are produced actually exchange hands in the most efficient manner to avoid waste and fulfill

customer needs. Pricing of public sector information is therefore not economically efficient

because the collection and distribution infrastructure is already funded by taxpayers. In this

case strategies other than zero-cost will prevent the public form enjoying the benefit of these

good trough consumption[22]. Another study shows that the case for marginal or zero cost

policies are quite strong. The marginal cost to deliver data to other sources than primarily

intended approach zero for many government datasets. Moreover, the business demand for

this data is likely to be high and grow over time. Furthermore, it is likely that the distri-

bution of free data will generate new innovative services. It is certainly safe to assume that

the market will be better equipped to innovate on this data than public institutions facing

heavy regulatory and budget constraints.[23].

When we look at the economics of open data in the public and geospatial transport

data we find that similar effects occur. A study on the impact of public sector geographic

information in the Netherlands shows that a reduction in the price of the entire vector map

of the Netherlands from e1 million to e200.000 caused a significant increased demand and

revenue for the cadastre[24]. Furthermore, a case study of the ’new map of the Nether-

lands’ containing planning information on housing and infrastructure projects maintained

by the Department of Housing and Special planning sheds an interesting light in the increase

of dataset usage. The department brought this dataset under creative commons license11

making it freely available for downloading. At first, the dataset was bought on average once

every month but by releasing the data under a public license increased to 200 downloads

per month[24].

A similar study on the economic effects of cadastral information was performed in Spain.

In 2004 the Cathalan regional government launched a cadastral information system providing

topographical and geo-data in an open way. Using a survey the cost-benefit effects of this

investment for government organizations (municipalities, regional and public authorities)

were investigated. The study showed that the information system increases the efficiency

and workings of other governmental organizations significantly. Although the investment in

the portal was high (e1,2 million) the benefits within other government authorities were in

2006 e2.371.000[25]. We can conclude that in some cases internal governmental organiza-

tions can benefit largely from open public sector information because data comes available

in a standardized way to both businesses and other branches of government.

10

Page 13: Final Draft BA Scriptie

CHAPTER 2. THEORY

Most of the research on open public sector information focusses on a macro economic

analysis of data provisioning. Although micro economic analysis should prove the case

decisively we found that based on the literature currently available the case for distribution

of public sector information at marginal or zero cost is quite strong.

2.2 Dutch government information architecture

In order to understand the context of the ICT landscape in this study we will briefly in-

troduce the information architecture of the Dutch Government. The Dutch Ministry of the

Interior and Kingdom relations is formally responsible for the ICT within the government.

The basic architecture that the central government should follow is formulated in NORA

(Dutch Government Reference Architecture), a set of principles, guidelines and technologies

that branches of government can follow to organize their ICT. The goals of Nora are to guide

individual government bodies in the design of their information architecture and supports

in policy making and deployment[27]. Within the architecture three principles are defined:

basic principles, collaboration principles and regulations. The basic principles describe the

relation between government, the public and businesses. The collaboration principles de-

scribe interoperability constraints and finally the regulations describe technical constraints,

standards and messages.

In the architecture different components can be identified:

1. Data Sources: (basisregistraties) the data sources or ’basis registries’ contain various

forms of data the government collects.

2. Service Bus: (servicebussen) the service bus is a data transportation facility that

can move pieces of information thourgh a messaging system

3. Transaction Gate: (transactiepoort) the transaction Gate allows organizations to

interact with the government on a machine level. For example when applying for a

tax refund.

4. Security and Identity: security and identity management are organized on the level

of the individual datasets but can be accessed through one identification system called

DigiD.

5. Front Office: the front office systems are used by various organizations to interact

with citizens and businesses. This can be a government website, but also a civil servant

supporting a citizen.

6. Organizations: the model allows for different organizations using similar architec-

tures within their organization to interact with each other.

The following image describes the relation between the different components.

The Nora architecture can be classified as a service oriented architecture. In a service

oriented architecture various virtual information services are defined which can be requested

11

Page 14: Final Draft BA Scriptie

CHAPTER 2. THEORY

Figure 2.2: The Dutch Government Reference Architecture (NORA)

by a user. Furthermore, service oriented architectures use well defined standards for mes-

sages and communication and are build up in a modular fashion. Technical implementations

of these service oriented architectures are usually web-services or some other form of infor-

mation service bus. The Dutch government is still in the phase of constructing this unified

information service bus. In this phase the focus is to enable interoperability, providing basic

technical standards and policies to enable information flow between different governmental

organizations. In the coming years in can be expected that these systems will evolve into

the alignment of administrative procedures and technical systems[28].

For the deployment of vast amounts of data in an open fashion it is important that both

the information service bus as well as alignment of technical systems and administrative

procedures are well organized.

Reflecting on this architecture in relation to open data we can identify a couple of problems.

First of all, the architecture does not include means to deliver raw data (basisregistraties)

to businesses. The current model includes a government transaction port that allows for

message transactions like for example declaring tax. Furthermore, the central front office

allows for the providing of services like requesting a new passport. No data interface is

provided in this architecture. Secondly, the current architecture only allows for security and

identity management at the front office or transaction port. The service bus that transports

the data is organized internally. This causes problems with open data because both public

and non-public data travel over the same bus. Finally, the architecture does not dictate

12

Page 15: Final Draft BA Scriptie

CHAPTER 2. THEORY

message or data standards that would come in handy when distributing open data. We can

conclude that the current architecture works as a barrier for open data. No central technical

infrastructure is in place to deliver the data.

2.3 Stakeholders

In this section we elaborate more on our choice of stakeholders and how they relate to

available literature. Most studies in open data are only concerned ’the government’ and

’businesses’ as stakeholders. We will use more specific definitions of stakeholders based on

Rowley’s e-government stakeholder definition[31].

1. Data provider: is a governmental organization delivering some form of valuable

public transport data. The data provider is depended on central government funding,

but can be outside of direct democratic control. The stake of this organization is to

fulfill their lawful obligation at the lowest cost. Examples of this stakeholder group in

the Netherlands the Dutch cadastre.

2. Network Operator the network operator stakeholder is the owner of the physical

infrastructure of the transport network (i.e. roads, tracks) and can be both a govern-

mental as well as a non-governmental organization. An example is the rail network

operator Prorail. A network operator can also be a data provider if law forces this

stakeholder group to deliver this data at zero cost. As an e-government stakeholder

the businesses can be classified as ’Governmental Organization’.

3. Service Operators: Using these networks to provide travel services are the service

operators. These operators can also be a governmental or non-governmental organi-

zation. The stake of the service operator is to provide an efficient and high quality

travel service. An example of this stakeholder group in the Netherlands is the rail

operator NS. As an e-government stakeholder the service operators can be classified

as ’Businesses’.

4. Businesses: The businesses are privately owned profit organization that can use

data provided by the operators to create services for the traveler. The stake of this

group is to get the data at the lowest possible cost in a usable format. As an e-

government stakeholder the businesses can be classified as ’Businesses’. An example

of this stakeholder group in the navigation company Tom Tom.

5. Traveler: The traveler is the end-user of the services from both the operators and the

businesses. As an e-government stakeholder the traveler can be classified as ’People

as service users’. The stake of this group in this research is to maximize quality of

services and minimize cost.

6. Transport authorities: the transport authorities are the regulatory bodies involved

in public transport. As an e-government stakeholder the transport authorities can

be classified as ’Public Administrators’. The stake of this group is to gain a good

understanding of the transport networks in order to control safety.

13

Page 16: Final Draft BA Scriptie

CHAPTER 2. THEORY

7. Civil Society: the civil society are citizens and foundations that advocate various

subjects. As an e-government stakeholder the civil society can be classified as ’People

as citizens’. Their interested in the way policies are organized and what their impact

on society is. The stake of this group in this research is to provide transparency and

accountability to decide on and evaluate policy.

Throughout the study these are the definitions of the stakeholders used.

2.4 The business model

In this section we describe the current business case of open data in the Netherlands. Fur-

thermore, we will elaborate on some blind spots literature and the effects on the business

cases of different stakeholders.

The current business case of government data starts at different government organizations

that collect data. These organizations collect and store the data. The data is then provided

under legal, financial and technical limitations. In the Netherlands, no central policy on

these limitations apply. A study on these limitations suggests that 31% of the databases

do not allow for commercial re-use. Furthermore, in 72% of the cases the data is available

free but only for non-commercial use. Finally, only 22% of the databases provide access

through other means then a web-interface (no direct access to the data). Only 4% of the

databases is accessible through a API[1]. In the cases were data is not freely available profit

maximization or cost-averaging pricing strategies apply. The data is then sold to businesses

that re-use the data in their applications. The business use some of the data to improve

their products. The limitations in this business model causes a lack of economic activity on

the government data.

We found that a gap exists in the current literature on open data. Most of the research on

distribution of public sector information at marginal cost has focussed on economic (macro),

policy or transparency effects. We put forward that to study the case of open data more

precisely the business case of different stakeholders should be analyzed more thoroughly. In

most of the studies conducted the stakeholders defined are ’government’ and ’businesses’ or

’the public’. These narrow definitions leave little room for the investigation of effects other

than the primary value chain and revenue models. In order to create a good design for open

data we will need to gain more insight into the business cases of the different stakeholders

instead of only looking at the global business model.

2.5 Research Question

Based on our problem definition and the exploration of the subject of open data in the

Netherlands we are ready to introduce the research question. In the previous sections we

proved the economic case for open data and found the most important causes for our prob-

lem. We now need to find out how we can solve these problems with our design. We will

focus on two causes of the problem:

14

Page 17: Final Draft BA Scriptie

CHAPTER 2. THEORY

1. Pricing: we will need to find a pricing strategy that maximizes net-value for both

businesses and government. We will design a business model that deals with this cause.

2. Technology: we will need to find a technical infrastructure to deliver the data.

From our theory section we expect that open data policies will cause changes in the

business cases of different stakeholders. We will need to investigate the effects of the design

of the new open data business model. Based on the theory and hypothesis about changes

in the business case we can introduce the primary research question.

What changes in the business model for public- and geospatial transport data could be

observed when open data would be made available?

The research question aims at finding the effects of an open data business model of various

stakeholders. We focus on public and geospatial transport data based on the statistics of

the American data portal data.gov. The statistics of this website show that geospatial and

transport data are among the most popular datasets businesses tend to reuse. Furthermore,

we focus on the Netherlands in order to be able to study the cases in detail in the amount

of time available.

The secondary research question focusses on solving the design question of our technical

infrastructure. If the government were to decide on an open data policy this will have

significant changes to the information architecture of government organizations. In the

current closed model data is used primarily internally and therefore interfaces to other

information system external to the organizations have not been realized. To be able to

deliver open data to businesses an interface should be designed. Therefore, the secondary

research question is:

What technical infrastructure should be provided in order to deliver open public- and

geospatial transport data to businesses?

15

Page 18: Final Draft BA Scriptie

Chapter 3

Methods

The goal of this study is to design a business case and technical infrastructure for open

data. The study is based on a literature review, open and structured interviews of various

stakeholders and specialists. Also various design methods such as requirements analysis,

business model generation, ORM modeling and data warehouse modeling have been used.

Because open data is subject to many influences concerning economy, privacy, civil society

and is influenced by many different stakeholders like citizens, business, civil society, civil

servants we believe that a literature and stakeholder analysis are appropriate methods to

review the depth of the subject.

Figure 3.1: The design proces

3.1 Literature Review

The literature review serves to find out the theoretical underpinnings of open data. We used

the literature review to find the main causes of the problem, and provide context to the

topic of open data. Furthermore, we looked into the electronic government architectures,

specifically the Dutch governments information architecture NORA.

3.2 Open Interviews

In order to gain more insight into the specific case of open data in the Netherlands and

to outline the methods used to design a business case for open data, interviews with var-

ious specialists were conducted. These specialists vary from government officials, business

leaders, civil servants and activists. Based on these interviews and the literature review

the structured interviews for analysis of the business case were constructed. A list of the

interview subjects can be found in the appendix.

16

Page 19: Final Draft BA Scriptie

CHAPTER 3. METHODS

3.3 Stakeholder Identification

Based on the open interviews and the literature review we made an analysis of the relevant

stakeholders. These stakeholders were used to selects respondents for the structured inter-

views. Furthermore, this identification served as means to retrieve consistent terminology

throughout the design phase.

3.4 Structured interviews

Structured interviews were then performed where the interviewer used a fixed set of ques-

tions to gain insight in both the business case and technical requirements. We choose this

interview form because it provides a good base for comparison of the different answers that

respondents give. We interviewed 2-3 respondents from organizations within every stake-

holder group that we defined. The interviews were performed in a special interviewing room.

Respondents could choose to remain anonymous. All of the conversations were recorded for

future reference. The interviews took between 1:30 and 2 hours and were performed during

the day. The interviews were conducted in the same chronology with every respondent. The

language of the interviews was Dutch. Depending on the respondents technological back-

grounds the business case question set, interface question set or both sets were requested.

A list of the interview subjects can be found in the appendix.

3.5 Business case analysis

To be able to gain insight in the low level effects of open data an analysis of the business

case of different stakeholders was performed. The business model generation method[26]

was used to analyze the business case of these various stakeholders. The business model

generation method uses nine area’s to describe a stakeholders business case which we will

explain here:

1. Partners: describes the key partners such as suppliers or government institutions are

found and a motivation for the partnership is explained.

2. Activities describes what key activities are preformed and how they contribute to

the revenue streams.

3. Value Proposition: describes what value is delivered to the customer and what

costumer need is solved.

4. Costumer Relations: describes what type of relationship the organization has with

their costumers, how costly they are and how they are established.

5. Costumer Segments: describes in what markets the organization operates.

6. Distribution Channels: describes the distribution channel of the organization.

7. Resources: describes what resources are necessary in order to create the value propo-

sition.

17

Page 20: Final Draft BA Scriptie

CHAPTER 3. METHODS

8. Cost Structure: describes what the most important costs inherent in the business

model are.

9. Revenue Stream: describes the nature of the revenue streams and finds what value

are our customers really willing to pay.

The results of the business case analysis and proposed model are presented in the business

case design section.

3.6 Requirements analysis

For the data warehouse design we used van Lamsweerde’s requirements engineering method[29].

Furthermore, Boehms analysis of non-functional requirements was used to gain insight into

qualitative aspects of the warehouse design[30]. The requirements engineering method uses

a process of scoping, stakeholder analysis, user characteristics definitions, product perspec-

tive, use case analysis and requirements specification to create a software interface design.

In order to account for non-functional requirements that might be important for the in-

terface we looked for usability, safety, efficiency, performance, capacity and interoperability

constraints.

3.7 Data Warehouse design

We choose to design a data warehouse as a technical solution for delivering open data to

businesses. To design this data warehouse we used a UML based method [33]. However,

instead of using UML to describe the data model, we used Object Role Modeling (ORM)[34].

This specific method was used because we have more experience with this type of modeling,

and this method allows for detailed conceptual modeling in a compact schema. The results

of this design are presented in the technology design section.

18

Page 21: Final Draft BA Scriptie

Chapter 4

Business Model Design

In this chapter we propose a design for the business model of open data in the Netherlands.

Furthermore, we analyze the impact of this business model on the different stakeholders.

The current business model of public sector information works as follows. Government bod-

ies collect various forms of transport data and store this for internal use. When a business

want’s to use this data for commercial purpose the data can be bought. This data is offered

at a competing or cost averaging pricing strategy. Most governments organizations don’t

structure their data in open standards. Furthermore, various types of license limitations

apply to the data. After the data has been sold, the business uses the data in a existing

product or service which in turn is sold to an end user.

Figure 4.1: The business model of open data

We propose an open business model. The business model of open data for public and

geo-spatial transport data essentially works as follows. Government organizations like the

Ministry of Transportation, the cadaster and the public transport network operators pub-

19

Page 22: Final Draft BA Scriptie

CHAPTER 4. BUSINESS MODEL DESIGN

lish structured, machine readable and free datasources in a data warehouse. Businesses then

download or link to this data and create new services.These services are then provided to

end-users. The government provides the data in a structured form based on available open

standards.

In this business model the situation for some of the stakeholders changes. The most

significant changes occur for the government organizations (i.e. data provider and network

operator stakeholder groups). In the designed business model these organizations will have

to change

1. Pricing Strategy: the pricing strategy for re-use of public sector data has to change

from competing or cost-averaging strategies to a free or marginal cost strategy.

2. Legislation: copyright, intellectual property and database law are adjusted in such a

way the data can be easily used by the businesses.

3. Technical Infrastructure: the organizations provide a technical infrastructure to

deliver the data sets or web-services to businesses.

4.1 Effects of the model

It can be expected that in this business model the economic activity of businesses around

this data increases significantly. All of the stakeholders that were interviewed expect a sig-

nificant increase in economic activity. For example, the developers behind the Train I-phone

App (Trein) expect that such a development will cause severe competition to create the best

travel app on a mobile device. The planning service OV9292 expects that not only competi-

tion will increase, but explains that the use of public transport will probably increase when

travel information is more widely available. There own research has shown that OV9292

increases use of public transport with 8%. We can thus expect more businesses will start to

use open data to generate revenue.

Furthermore, it can be expected that new types of innovative services will emerge with

open data. In New York, San Francisco and other major city’s that opened up their data

within months various types of travel services emerged12. The respondents from the inter-

views also expect new and innovative services to emerge when government data is combined

with commercial data sets and services. One of the examples that was mentioned in the

interviews was a toilet finding service in Denmark. This service provides citizens with a

bladder defect with the location of toilets in their area, a service that could not have been

created without open data. With our business model we can expect that the business po-

tential currently untapped in the Netherlands could be opened up. The effects that this

business model has on the business cases of the various stakeholders will be explored in the

next section.

20

Page 23: Final Draft BA Scriptie

CHAPTER 4. BUSINESS MODEL DESIGN

4.2 Effects on the stakeholder business cases

This section describes the effects of the business model on the specific business cases of the

stakeholders we interviewed. We use the definitions of the different aspects of the business

case introduced in the methods section. For every stakeholder the aspects of the business

case that change are described. If an aspect is not described in this section no relevant

changes were observed.

1. Data provider: for the data provider some significant changes to the business model

can be observed. The most significant change is the loss of income due to different

pricing strategies. The revenue streams of these data providers change because they

will have to compensate for the loss of income. We propose that this loss of income

is compensated by the national government since they are beneficiary of the effects

of open data through taxation. Furthermore, the distribution channels of the data

providers will change. Based on the interviews we can observe that both the cadastre

and the providers of transport data fear this loss in income. The cadastre furthermore

fears that national government is not willing to compensate for the loss of income. In

this case they will either decrease the number of key activities, or will increase the

price of other products they currently deliver to the market.

Furthermore, some organizations will have to provide a technical infrastructure to

deliver vast amounts of data to businesses. This infrastructure will change the way

distribution channels are organized. This change in infrastructure will also require an

investment in technology for some of the organizations. Other area’s of the business

case of these organizations like costumer segments, resources and partners will not

change in our business model.

2. Network Operator: for the network operator the most significant changes occur

when they are a provider of data. For example,in the railway sector Prorail main-

tains the network and provides the data on locations of trains to the different service

operators on the network. In this case the change in pricing strategy will decrease

their overall income. However, the network operators in general are already obliged to

provide this data to their main customers: the service operators under Dutch public

transport law (wet personenvervoer). The most significant change for the network

operator is the change in customer segments. When open data would be introduced a

new group of customers for the data would emerge: businesses.

3. Service Operators: for the service operator changes in the cost structure will occur.

Data that was only commercially available can now be obtained at zero or marginal

cost. For some operators like for example NS this could be a significant decrease

in cost for data collection. Furthermore, based on the interviews with OV9292 the

availability of free public transport data will increase the number of customers that

use their services. This increases the volume of the revenue stream obtained from

travel services.

21

Page 24: Final Draft BA Scriptie

CHAPTER 4. BUSINESS MODEL DESIGN

4. Businesses: like the data providers, the changes to the business model of businesses

is significant. In the old model businesses had to pay for the acquisition of data

from government bodies. In the proposed model this data is available for free, which

significantly lowers the cost of acquisition of data products. Furthermore, by enforcing

the use of open standards the cost for changing the data into appropriate formats will

decrease. We can therefore conclude that the cost structure of these business changes

in the business model.

Furthermore, based on the interviews we can conclude that competition will increase.

Respondents expect that the barrier to enter the market with a certain service will

lower. For example, one of the respondents expects that acceptable quality navigation

products could be made with the map provided by the cadaster. The main cause for

lowering this barrier is that no significant investments in acquisition of high quality

mapping data is required when the map can be downloaded for free at the cadastre.

Also, key activities of some business can change due to the change in the business

model. For example, commercial mapping organizations like Google, Tom Tom and

Navteq currently rely on land metering and other mapping techniques for their map-

ping product. At least 20 properties of these mapping products could be made available

for free through the cadastre.

5. Traveler: for travelers we can’t really speak of a business case. We will however state

the obvious changes this stakeholder incurs in our business model. The traveler will

experience an increase in the number of services available to them. Furthermore, due

to the increase in competition the quality and functions of the services provided will

probably increase.

6. Transport authorities: since the transport authorities play no vital role in the

business model we will deem them out of scope. Some of the effects that we might

expect that influence transport authorities is that the availability of more data will

give vital insight in the performance of the transport networks. This could lead to

better policies at the government level.

7. Civil Society: civil society organizations currently play no significant role in the

business model of open data. However, it can be expected that civil society organiza-

tions engage in the creation of ’social’ applications. These applications were previously

to expensive to develop because of the data acquisition efforts, but become viable in

our new model. Some examples of these types of applications are Schoolscope in the

United Kingdom. This website offers parents a benchmark of the quality of schools.

Another application reports on hazardous locations in the New York Manhattan area

based on traffic data published by the government.

By using the business model generation method we found that the most significant

changes in our design are a change in cost structure of the providers and users of data.

22

Page 25: Final Draft BA Scriptie

Chapter 5

Technology Design

On of the causes of problem is the lack of technical infrastructure to deliver high quality

data to businesses at high speed. We performed a requirements analysis that has lead to a

technical solution to our problem. In this chapter we propose a design of a data warehouse

for public and geo-spatial transport data.

A data warehouse is essentially a data storage and decision support system based on a

variety of different datasets. In business data warehouses are frequently used as management

support tools. A data warehouse is always subject-oriented and records and interprets

attributes of these subjects over time. Some examples of subjects in our case are vehicles,

stops, travelers and so on. We chose to design a data warehouse above a normal database

system because a data warehouse allows for decision support (planning) and can cope with

multiple sources of different information. The scope of this design is an analysis of the

landscape where the warehouse will operate in, a draft architecture of the different data

warehouse layers, a data model for the storage of public and geospatial transport data, an

interface design and recommendations on standards and hardware. We will not look into

front-end applications, query structure, optimization, rollout or maintenance aspects of the

data warehouse. We used the UML-based data warehouse design method to create this

design[33].

5.1 Landscape

Before we can describe the interface design we need to define the context architecture in rela-

tion to the value chain. The data warehouse collects data from different data providers and

network operators. This data is processed and packaged in the warehouse. We assume that

the standards as defined by the European Committee for Standardization (CEN) Service

Interface for Real Time Information CEN/TS 1553113 which includes data on timetables,

network monitoring, vehicle monitoring, connection monitoring and a general message ser-

vice will be used. For the geographical data various vector forms can be distributed. In

this study we assume web map service, web feature service and web mapping tile service by

the open geospatial organization are used. For the traffic and delay data we suggest to use

the European Open Travel Data Access Protocol (OTAP) and the standards defined by the

23

Page 26: Final Draft BA Scriptie

CHAPTER 5. TECHNOLOGY DESIGN

National Database Road-traffic (NDW).

Figure 5.1: The data warehouse in it’s context

After the data is processed and packaged it can be delivered through the interface. Public

transport data can be defined as data regarding the physical infrastructure (stops, stations,

routes), the timetable (planning, platforms), and the status of the network (delays, out-

ages). Geo-spatial transport can be defined as data regarding the main motorway network

(network, ramps) and the status of the network (traffic jams).

5.2 Warehouse Architecture

This section describes the general architecture of the data warehouse. A data warehouse

is generally build up out of four main components. First their are multiple data sources

that provide different sorts of information to data warehouse. In our example road, train,

network and mapping data feeds into the data warehouse. After the data has been processed

through the different layers of the data warehouse it is offered to users in a data mart. This

data mart is a subset of the larger data store and is oriented to either public transport or

road network relevant data. When a user requests certain data from the data mart trough

the interface (API) it can be re-used in an application. In this model we also included a

planning layer that can interpret the different sorts of raw data and return routing and

planning information.

We explicitly place this layer outside the data processing part of the data warehouse

because we want to keep this planning capability of the data warehouse optional. We want

to keep this optional because these specific types of planning packages are also used in the

market and might introduce unfair competition to other vendors of planning software.

24

Page 27: Final Draft BA Scriptie

CHAPTER 5. TECHNOLOGY DESIGN

Figure 5.2: The data warehouse architecture

The source layer of the data warehouse is the physical infrastructure that gathers the

data from the different data sources. In our data warehouse the data sources either push

the data to the data warehouse at some predetermined interval, or a separate data scraper

is used to collect the data. In the extraction layer the scheduling of the data extraction from

the data sources is organized. For example, the vector map of the road network probably

won’t require an update more regular than once or twice every week, were the location of

a train will probably have to be updated every 30 seconds. Some data warehouses feature

a staging area that is used to normalize the data and check for quality, coverage and other

constrains. Such a staging area would be relevant if a large number data sources would

be used and if the quality of this data could not be trusted. Since the providers of the

data are all known, agreements can be made on these aspects of the data delivery and we

will not require data staging. In the ETL (Extraction, Transformation and Load) layer the

data from the extraction layer is used and transformed into the relevant data structure,

meta data is extracted and the data is loaded into the databases. In this process the data

is checked for integrity, cleaned and sometimes translated. The ETL stage takes does not

directly operate on the databases of the data warehouse but uses staging tables. Depending

on the requirements of the data and the update frequency the different steps used can vary.

After the ETL layer the data is processed in the storage layer. This layer basically the

data base management system of the data warehouse (DBMS). The primary task of this

layer is to store and retrieve data from the data warehouse. It uses the ACID properties

(atomicity, consistency, isolation, durability) to guarantee data warehouse transactions are

processed reliably. The storage layer pushes different types of data on set intervals to the

two data marts that we included in the design. The data marts are a subset of the data

present in the data warehouse relevant to the user group. We use two different data marts

for different redundancy purposes. First, the data marts can be hosted on different hardware

25

Page 28: Final Draft BA Scriptie

CHAPTER 5. TECHNOLOGY DESIGN

environments than the data warehouse. This will make sure that if the data warehouse for

some reason goes offline data can still be extracted. Furthermore, if these data marts were

non-existed and the API would be coupled to the data warehouse directly a failure in the data

warehouse would cause both the vital road and public transport information infrastructure

to go offline together. This could lead to major delays on both the public transport and

road network. Finally, the data marts allow for a much cheaper failover environment than

the data warehouse. Because a data mart is essentially a big cache of the subset of the data

warehouse it could be mirrored onto different physical locations. The final layer in our data

warehouse design is the interface with the end-users. This interface design will be defined

further on in this chapter.

5.3 Data Model

To be able to store data in our data warehouse we will have to model the data first. For the

geo-data and traffic data some good internationally accepted data models are already freely

available to use. We choose to adopt these standards in our design. For the Geo-spatial

information the OpenGis Map Service standard will be used[35]. The road data model will

be based on the model already used by the Dutch National Database Roadtraffic14. However,

such a well defined data model misses for public transport data in the Netherlands. Some

efforts have been put into the BISON standard. This standard however, only models the

interfaces between various service providers in the public transport domain. For the public

transport data a draft version of the BISON standard and the interviews have been used to

derive a data model. We tried to combine the BISON standard with the already available

CEN/TS 15531 standard for public transport defined by the European Comittee.

Figure 5.3: Available data models

Based on the service interface requirements we used the Object Role Modeling (ORM)

technique[34] to generate the model for public transport. The model only describes the

conceptual data relations in the data warehouse. We’ve used nine elementary object types

to describe the domain of public transport.

The vehicle object type is the physical means of transportation (e.g. train, bus, taxi)

and has various attributes such as a location, capacity and the availability of a toilet. A

vehicle is maintained by a certain service operator which only has a name in our model. At

the infrastructure side of the spectrum we defined a stop, platform and connection. A stop

26

Page 29: Final Draft BA Scriptie

CHAPTER 5. TECHNOLOGY DESIGN

Figure 5.4: The ORM data model for public transport

is a physical location where a vehicle can stop to drop off travelers. A stop can have multiple

platforms. The route between two stops or platforms can be defined as a connection, which

has a distance and can be available or unavailable. A connection is maintained by a network

operator. Furthermore, the unique combination of a connection, vehicle and a planned

item results in a schedule. The planning item contains a departure and arrive timestamp

(date & time) and may contain a note for the operator. Different planning items together

generate a route for a passenger. When the planning changes a exception can be created.

This exception is a message to the traveller and operators that a certain planned item has

changed. An exception can also be a single message that has no influence on the planning.

5.4 Interface

To connect the data warehouse to the business users an Application Programming Interface

(API) will be constructed. The interface will act as a data provisioning system for public

transport and geo-spatial data. For both data types a separate API will be constructed

capable of providing the data for both the public transport and the geo-spatial transport.

The interface will be run as a web service that allows for access through the HTTP proto-

col (over the web). The interface will be constructed on a Representational State Transfer

27

Page 30: Final Draft BA Scriptie

CHAPTER 5. TECHNOLOGY DESIGN

(REST) communication bus that uses messages formatted in Extensible Markup Language

(XML). The choice for REST is based on the focus on different system states that can be

retrieved through the interface using common operands (like GET, POST, PUT, DELETE).

This type of API provides scalability, safety, stability, generality in interfaces, latency re-

duction and is flexible enough to extend with more services in the future. For the messages

that are being sent through the interface the XML standard will be used. XML is an W3C

consortium approved standard for machine readable document markup. It provides enough

freedom to define custom schemas for the propose of geo and public transport data provi-

sioning without losing standardization.

A rest interface can be built on different programming languages, databases and services.

Since the systems that are being used by the different data providers are unknown to us

some assumptions have to be made. We assume that the data provides want high flexibility

and extendibility in programming language. Furthermore, they want low implementation

and maintenance cost, finally they want the interface to be compatible with the wishes of

the third party developers.

Taking into account these requirements the interface will be build on Python. Python is

a multi paradigm language allowing programmers to incorporate different styles of coding.

Python is a stable language that is provided natively in many Linux distributions and works

flawlessly with Oracle web servers. Many large corporations like Google, ABN-AMRO,

CERN and NASA use Python for their interfaces.

Depending on the relation with the data provider (either local caching or direct API) a

database is required. The construction of this interface will be built on an Oracle 11

database. The database can be manipulated using Standard Query Language (SQL) which

is an international standard for interaction with relational databases.

The interface will deliver data through web-services. When a user registers for an API key

the services can be used. We split the API for the rail and road network into two separate

API’s for redundancy. We believe this redundancy is required because if the system were

to be one single API, a failure would result in no transportation data what so ever. For the

public transport data the following categories of service calls to the API can be defined:

1. Planning Services: the planning service category contains several planning and

decision services. These services are used to determine optimal routes based on various

parameters. The most important services are the ’Planned Timetable Service’ which

returns the current timetable. The ’Estimated Timetable Service’ also takes into

account the actual state of the network and adjusts the planning accordingly.

2. Monitoring Services: the monitoring services category contains several network

monitoring services. The goal of these services is to determine the current state of the

networks and vehicles. The exception monitoring service provides information into

network exceptions like the failure of turnpikes. The stop monitoring service provides

information on the stations and platforms. The vehicle monitoring service provides

information on the location of individual vehicles. Finally, the network and connection

monitoring service provides meta-information on the state of the network.

28

Page 31: Final Draft BA Scriptie

CHAPTER 5. TECHNOLOGY DESIGN

3. Other Services: the other services category contains services that relate to pricing,

messaging and interaction with the network operator.

For the public transport data the following categories of service calls to the API can be

defined:

1. Planning Services: the planning service category contains two services that can

return the delays on the specific sections of road. Furthermore, the estimated capacity

service returns the probability of a capacity shortage on a certain section of road based

on real time measurement and statistical data.

2. Monitoring Services: the monitoring services category contains several network

monitoring services. The goal of these services is to determine the current state of the

network and connections. Several different services report on planned maintenance,

incidents, connections etc.

3. Map and Network Services: the map and network category contains services re-

turning static data on the road network. Several services provide a download the latest

version of the road vector map, static information on junctions and exits and static

information on road facilities and signs.

4. Other Services: he other services category contains services that relate to pricing,

messaging and interaction with the network operator. Furthermore it provides streams

of video and weather stations at the road side.

A more extensive analysis of the services and the design can be found in the appendix.

5.5 Hardware

The data warehouse will have to run onto a solid physical infrastructure. We will present

some recommendations on the hardware of the data warehouse. We will have to take into

account the scalability, parallel processing capabilities, database management / hardware

combination and cost effectiveness of the hardware environment. Based on the expected

usage of the data warehouse we can expect that the system will sometimes require a high

peak capacity. For example when major malfunctions to the public transport system occur

expected API requests per min can triple. But we cannot plan for these types of outages,

so our hardware will have to be able to cope with these peak loads. Furthermore, since high

volumes of API requests are performed on the system parallel processing support could in-

crease reliability and speed. Finally, it is important that the software and operating systems

used match with the database management tool that we selected.

The goal of this recommendation is to find a solution that has a high reliability and

is cost-efficient. We recommend the use of a cloud oriented hardware. In a cloud server

setup virtual server capacity is rented with a cloud infrastructure provider like Amazon.

The advantages of cloud operated services is that they can scale elastically with the end-

user demand. Furthermore, cloud infrastructure providers have preconfigured virtual servers

29

Page 32: Final Draft BA Scriptie

CHAPTER 5. TECHNOLOGY DESIGN

readily available for use. This will reduce the cost for maintenance personnel significantly.

A possible specification for this hardware could be:

Amazon Elastic Compute Cloud (Amazon EC2)15

Servers: High-Memory Double Extra Large Instance 34.2 GB of memory, 13 EC2

Compute Units (4 virtual cores with 3.25 EC2 Compute Units each), 850 GB of local

instance storage, 64-bit platform. This setup allows for high transaction volumes.

Operating System: Oracle Enterprise Linux

Database System: Oracle Database 11g

Application Server (running python): Oracle WebLogic Server

Service Packages: Amazon Elastic Block Store, Elastic IP Addresses, Amazon

Virtual Private Cloud, Amazon CloudWatch, Auto Scaling, Elastic Load Balancing

5.6 Qualitative Aspects

The final design specifications for this data warehouse have a non-functional nature. We’ve

investigated the performance aspects of the database based on the interviews. For the geo-

spatial data we can expect 5000-10000 requests / min. With the public transport data we

expect 500 planning requests, which we estimate will cause 5000 requests / min . We were

unable to retrieve the expected amount of requests for the road network. We estimate the

number of requests to be 5000 / min. The total number of request that should be handled

by the data warehouse therefore should be: 20.000 API requests per minute.

The update frequency of the data depends on the specific type of data. The vector map

has an update speed of twice a year, while the location of trains has to be updated every 30

seconds. The uptime of the entire system has been set at 99,5%. Safety requirements are

quite low because all the data from the system is already available to the public. To use the

API the user has to register using a encrypted hash key. With this key possible fraud can

be traced. In terms of usability all the relevant standards have been adopted in the design.

30

Page 33: Final Draft BA Scriptie

Chapter 6

Discussion

In this chapter we will reflect on some of the effects of our open data design. Furthermore,

we will comment our findings in relation to the available research on this subject.

6.1 Effects on businesses

One of the main causes that we identified for the lack of economic activity on government

data was the pricing model governments currently use. The current literature on open data

only investigated these effects based on macro economic analysis and models. With our

study we proved that on a low level stakeholder analysis these effects seem to be consistent

with literature. The respondents expect significant increase in competition in the fields were

the data were to be made open.

6.2 Changes in government cost structures

For the data providers the introduction of open data policies causes their cost structure to

change significantly. Were the organization before could rely on a steady source of income

from commercial pricing of data, they will have to either cut cost or find alternative sources

of funding. The former alternative will leave the data providing organizations no choice but

to request budget increases from the national government. Alternatively, pricing on other

services like the cadaster excerpt will increase to compensate for the loss of income.The

latter will cause either a decrease in the quality of service or a decrease in the number of

services offered.

Governments should be aware that this is a distribution problem. The national government

will be the net beneficiary of certain policies due to taxation of the services provided on

open data. Therefore it would be logical that the national government compensates data

providing branches for the loss of income.

31

Page 34: Final Draft BA Scriptie

CHAPTER 6. DISCUSSION

6.3 Loss of intellectual property and market disturbance

In this study we observed that governments sometimes tend to endeavor into activities that

could be seen as market activities. Activities like consultancy and additional services offered

together with the primary services the governmental body provides. This creates a market

in which government organizations compete with businesses. Especially in the open data

debate this causes friction between businesses and the government. Governments should be

aware that they can cause severe market disturbances in certain sectors when implementing

these policies.

Businesses that own datasets that can compete with government data sets that are

currently proprietary expect that they will lose value of their intellectual property. For ex-

ample, the vector map of the Netherlands directly competes with mapping services provided

by commercial organizations like Tom Tom, Navteq and Tele Atlas. Although these maps

serve different purpose and are much more detailed, the introduction of open data policies

will significantly lower the entry boundary for competitors. Some organizations fear that

this will lead to serious damage of the intellectual property enclosed in the maps and see

this as unfair competition and therefore governmental market disturbance.

6.4 Legal: insuring coverage, quality, privacy and neu-

trality of data

We found that the definition of open data leaves some debate on how licensing should

work. Some authors claim that open data should imply that governments abandon all

rights they could vest onto the data. This would mean that no copyright, database or

other information right can be claimed. We believe that this would be unwise for two

reasons. First, abandoning these rights would mean massive changes in all kinds of copyright,

database and trading laws. We believe that this could impair the adoption of these open

data policies with different branches of government. Second, governments should be able

to forbid some forms of use of the data when this is in the public interest. For example,

governments should be able to claim neutral usage of the data. In the case of transport data,

a planning services could be constructed on public data that favor some network operator

in suggesting routes to travellers. Furthermore, the quality of the data maintained by the

governmental body should remain intact in some cases. For example, in the case of the

cadastre legal status can be attributed to certain locations in the country. If such a status

is attributed to a piece of property on a commercial while referencing the cadastre as the

source of the data citizens could sue if the information is misrepresented. Also, privacy

issues may apply to some data sets that are distributed. For example, the cadastral register

could be abused by large corporations like Google to create detailed records of individuals.

We propose a licensing structure (Data Commons) which can be used by both companies

and government bodies controlling the legal status of the data provided. These licenses

can use some of the attributes that are currently available in licensing of creative works

(Creative Commons) like share-alike and non-commercial. The license should be expanded

32

Page 35: Final Draft BA Scriptie

CHAPTER 6. DISCUSSION

with additional attributes like neutral, privacy, quality and coverage.

6.5 Data vs. Services

Another interesting finding is the somewhat ambiguous nature of the word ’data’ used in

open data policy debates. In terms of government data this could mean static structured

data, or a stream of real-time data. If governments start to publish web-services that provide

dynamic data some issues arise. In public transport data a planning service would not only

provide the ’raw data’ of the timetable and the possible routes, but would also provide a

routing algorithm. This intelligence that is added to the data may lead to unfair competition.

From a technical perspective, there is also a big difference between delivering a whole static

data set or providing a web-service. Governments should carefully consider what types of

dynamic data they are willing to provide to the public.

6.6 Risks of the design

Finally, when we look into the design of our technical infrastructure some topics could be

discussed. First of all, the design is focussed primarily on the market by bringing together

all the relevant data for transportation. This may cross the reality of governmental decision

making. Data providing parties may not want to work together in creating such a data

warehouse. Furthermore, the design does not have a specific problem owner. We would

expect it to be operated by the Ministry of Transportation. However, the government could

start op a project of enormous size to realize this data warehouse. The risk of such a project

not succeeding is quite high in the Netherlands. Further study in the execution of such a

design would be needed to determine if one could ’start out small’, and increase the project

according to it’s success. Finally, no vendors for the cloud oriented hardware environment

proposed are located in the Netherlands. Law could forbid the use of cloud infrastructure

situated somewhere else in the European Union.

33

Page 36: Final Draft BA Scriptie

NOTES

Notes

1Wikipedia, http://nl.wikipedia.org/wiki/Kadaster, accessed December 23rd, 20102BarCamp - http://en.wikipedia.org/wiki/Bar camp3Sunlight Labs, http://sunlightlabs.com/people/, accessed January 2nd, 20114Memorandum on Transparency and Open Government for the Heads of Executive Departments and

Agencies (2009),p2, President Barack Obama5Data.gov Community, http://www.data.gov/community6UK Data Portal, http://data.gov.uk7Catalan Open Data: Dades Obertes Gencat, http://dadesobertes.gencat.cat8Norway Data Portal, http://data.norge.no/9PSI Directive 2003/98/EC, http://ec.europa.eu/information society/policy/psi/docs/pdfs/directive/psi directive en.pdf

10Visby Declaration, http://ec.europa.eu/information society/eeurope/i2010/docs/post i2010/additional contributions/conclusions visby.pdf11Creative Commons. http://www.creativecommons.org12San Francisco App Showcase - http://datasf.org/showcase/13European Committee for Standardization, Service Interface for Real time Information: Whitepaper,

09-01-201014Nationale Databank Wegverkeer - http://www.ndw.nu/pagina/nl/4/databank/31/data/15Amazon AWS Cloud - http://aws.amazon.com/ec2/)

34

Page 37: Final Draft BA Scriptie

Bibliography

[1] te Velde et all, Open Data in Nederland: Stand van zaken toegang datasets rijksoverheid.

Dialogic / Ministerie van Binnenlandse Zaken, p.10, July 2nd 2010.

[2] Robinson et al, Government Data and the Invisible Hand. Yale Journal of Law and

Technology, Fall 2008.

[3] Frissen, V.; Slot, M.; Adrichem, L et al., De duurzame informatiesamenleving: jaarboek

ict en samenleving 2010. Sociaal Economische Raad, 2010.

[4] Castells, M., The rise of the network society. Wiley-Blackwell Publishing, ISBN

0631221409 2000.

[5] Antonijevic, S.; Gurak, L.J., Trust in Online Interaction: An Analysis of the Socio-

Psychological Features of Online Communities and User Engagement. Rinascimento Dig-

itale, p1. 2009.

[6] OECD, Participative Web and User-Created Content, Web 2.0, Wikis and Social Net-

working, 2007.

[7] Madey, G.; Freeh, V.; Tynan, R., The open source software development phenomenon -

an analysis based on social network theory, Eight Americas Conference on Information

Systems, 2002.

[8] Socrata, Open government data benchmark study, Vision Critical Research Group, Au-

gust 2010.

[9] de la Beaujardiere, J., OpenGIS Web Map Server Implementation Specification, Version:

1.3.0, OpenGIS Implementation Specification, 2006-03-15.

[10] Frissen, V.; van Staden, M.; Huijboom, N.; Kotterink, B.; et all., Naar een User Gen-

erated State? De impact van nieuwe media voor overheid en openbaar bestuur, TNO /

Ministerie van Binnenlandse Zaken en Koninkrijksrelaties, p62-65, 2008.

[11] Software Transparency Group , Naar een Scope of Transparency, Software Transparency

Group - PUC-Rio, Juli 2009.

[12] Relly, J.E.; Sabharwal, M.; , Perceptions of transparency of government policymaking:

A cross-national study, Government Information Quarterly 26, 2009, pp. 148157.

35

Page 38: Final Draft BA Scriptie

BIBLIOGRAPHY

[13] Bertota, J.C.; Jaegera P.T.; Grimes J.M., Using ICTs to create a culture of trans-

parency: E-government and social media as openness and anti-corruption tools for soci-

eties, Government Information Quarterly 27, July 2010, pp. 264-271.

[14] Webera, R.H., Transparency and the governance of the Internet, Computer Law &

Security Report, Volume 24, Issue 4, 2008, pp. 342-348.

[15] McIvor, R.; McHugh, M.; Cadden, C.; Internet technologies: supporting transparency

in the public sector, International Journal of Public Sector Management, Vol. 15 Iss: 3,

pp.170 - 187.

[16] Weiss, P., Borders in cyberspace: conflicting public sector information policies and their

economic impacts., 2004.

[17] PIRA, Commercial exploitation of Europe’s public sector information., European Com-

mittee, 2000.

[18] Economist Intelligence Unit; IBM institute for business value, Digital economy rankings

2010 - Beyond e-readiness, Economist Intelligence Unit, june 2010, pp. 4.

[19] Pollock, R., The Value of the Public Domain, Cambridge University, Institute for Public

Policy Research, 14 July 2006.

[20] Kim, S.; Kom, H.J.; Lee, H., An institutional analysis of an e-government system for

anti-corruption: The case of open, Government Information Quarterly, 5 November 2008.

[21] Pollock, R., The economics of public sector information, Cambridge University, Cam-

bridge Working Papers in Economics, May 2009.

[22] Nilsen, K., Enhancing Access to Government Information: Economic Theory as It

Applies to Statistics Canada, University of Western Ontario, Canada, The Socioeconomic

Effects of Public Sector Information on Digital Networks, National Academy of Sciences,

2009.

[23] Pollock, R.; Newbery, D.; Bently, L., Models of Public Sector Information Provision

via Trading Funds, Cambridge University, Commissioned by Department for Business,

Enterprise and Regulatory Reform (BERR) and HM Treasury in July 2007, February

2008.

[24] Donker,F.W., Different PSI Access Policies and Their Impact, Delft University of Tech-

nology, The Netherlands, The Socioeconomic Effects of Public Sector Information on

Digital Networks, National Academy of Sciences, 2009.

[25] Almirall, P.G.; Bergad, M.M.; Ros, P.Q., The Socio-Economic Impact of the Spatial

Data Infrastructure of Catalonia, Universitat Politcnica de Catalunya, Centre of Land

Policy and Valuations, Commissioned by European Commission Joint Research Centre

Institute for Environment and Sustainability, 2008.

[26] Osterwalder, A.; Pigneur, Y., Business model generation, ISBN: 978-0-470-87641-1

2010, John Wiley & Sons, 281 pages, 2009.

36

Page 39: Final Draft BA Scriptie

BIBLIOGRAPHY

[27] Zwienink, S, NORA 3.0 Katern strategie, GBO Overheid, 2010.

[28] Guijarro, L., Interoperability frameworks and enterprise architectures in egovernment

initiatives in Europe and the United States, Communications Department Technical

University of Valencia Camino de Vera, Government Information Quarterly 24, p.p.

89101 2007.

[29] van Lamsweerde, A., Requirements engineering: from system goals to UML models to

software specifications, John Wiley & Sons, 2009.

[30] Boehm. B.I., Software engineering: a holistic view, Oxford University Press, 1992, p176.

[31] Rowley. J., e-Government stakeholders who are they and what do they want, Interna-

tional Journal of Information Management, December 2010.

[32] Verveld, J., Business Case: Nationale Databank Openbaar Vervoer (NDOV), the joint

Dutch public transport operators, Augustus 2009.

[33] Prat, N.; Akoka, J.; Comyn-Wattiau, I., A UML-based data warehouse design method,

Decision Support Systems, issue 42, 2006.

[34] Halpin, T., Object-Role Modeling (ORM/NIAM), Handbook on Architectures of Infor-

mation Systems, Springer, Heidelberg, Ch. 4., 1998.

[35] de la Beaujardiere, J., OpenGIS Web Map Server Implementation Specification, Version:

1.3.0, OpenGIS Implementation Specification, 2006-03-15.

37

Page 40: Final Draft BA Scriptie

Chapter 7

Appendix

38

Page 41: Final Draft BA Scriptie

CHAPTER 7. APPENDIX

.1 Requirements Document

39

Page 42: Final Draft BA Scriptie

CHAPTER 7. APPENDIX

.2 Interview Schema

40

Page 43: Final Draft BA Scriptie

CHAPTER 7. APPENDIX

.3 Final Presentation

41

Page 44: Final Draft BA Scriptie

CHAPTER 7. APPENDIX

.4 List of Interviews

In this study two types of interviews were used, expert meetings that were performed in an

open fashion and structured interviews. The following people and organizations have been

consulted:

Expert Meetings

drs. Ir. T.A. van den Broek - Functie - TNO

dr. B. Kottering - TNO

Noor Hijeboom - TNO

Frank Berkers - TNO

Lex Slaghuis - Hackdeoverheid

Valerie Frissen - Erasmus University

Henri Rauch - Ministery of the Interior

Mark Hartman - ICT Office

Jan Willem Boissevain - Logica

Wout Hoffman - Ministry of the Interior and Kingdom Relations

Structured Interviews

D. Eertink - Kadaster

M. Salzmann - Kadaster

A senior manager from a navigation or mapping company

Another senior manager from a navigation or mapping company

A. Quarles van Ufford - OV9292

T.Wildvalk - OV9292

H. Hoff - Openstreetmap

D. Stevensen - Trein (I-phone App)

The Dutch railway operator NS, and the National Datawarehouse were invited to par-

ticipate in this study but were not willing or able to cooperate.

42

Page 45: Final Draft BA Scriptie

CHAPTER 7. APPENDIX

.5 Acknowledgement

This thesis would not have been possible without the support of the following people:

prof. dr. H.G. Sol

drs. Ir. T.A. van den Broek

dr. B. Kottering

dr. F.T.H.M. Berkers

N. Buur MA

drs. J.S. van Grieken

drs. B. Teeuwen

Ir. M. Schenkel

Ir. M. van de Schootbrugge

the dedicated volunteers at Het Nieuwe Stemmen

the social hackers at Hackdeoverheid

the support of my colleagues at TNO

43

Page 46: Final Draft BA Scriptie

List of Figures

2.1 The data value chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 The Dutch Government Reference Architecture (NORA) . . . . . . . . . . . . 12

3.1 The design proces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.1 The business model of open data . . . . . . . . . . . . . . . . . . . . . . . . . 19

5.1 The data warehouse in it’s context . . . . . . . . . . . . . . . . . . . . . . . . 24

5.2 The data warehouse architecture . . . . . . . . . . . . . . . . . . . . . . . . . 25

5.3 Available data models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

5.4 The ORM data model for public transport . . . . . . . . . . . . . . . . . . . . 27

44