61
1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural Resources Management Training Workshop Data Management in the GLOWA Volta Project Antonio Rogmann Center for Development Research (ZEFc) University of Bonn Wednesday, December 12 – Friday, December 14, 2007 DGRE, Ouagadougou, Burkina Faso

1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

1 Antonio Rogmann (ZEFc), Universität Bonn

Data Management in the GLOWA Volta Project

Data Management and Application of GIS and Remote Sensing in Natural Resources Management Training Workshop

Data Management in theGLOWA Volta Project

Antonio Rogmann

Center for Development Research (ZEFc)University of Bonn

Wednesday, December 12 – Friday, December 14, 2007 DGRE, Ouagadougou, Burkina Faso

Page 2: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

2 Antonio Rogmann (ZEFc), Universität Bonn

Data Management

Content

1. Data Management Problems, Solutions and Challenges

2. Data Management Workflow

3. Data Management Infrastructure Concept Components and Interfaces

Page 3: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

3 Antonio Rogmann (ZEFc), Universität Bonn

Data Management: Problems

Survey with GLOWA Volta Partner Institutions and Stakeholders at the

PARTNERS’ CAPACITY NEEDS ASSESSMENT WORKSHOP(31.05.-01.06.2007, Accra, Ghana)

For understanding

coherences between the institutions in terms of data exchange / flows related to water management

data environment: software/models in use, data storage and access facilities, hardware

defined set of problems in managing (getting access to) data

as condition for

adjusting the data management system of the GLOWA Volta Project to the requirements of the partners

offering solutions to increase the quality of data management to the partners

Page 4: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

4 Antonio Rogmann (ZEFc), Universität Bonn

Consequences of lack of data management

Institutions participating in the survey:

Coalition of NGO's in Water and Sanitation (CONIWAS) Kwame Nkrumah University of Science and Technology, Kumasi (KNUST) Soil Research Institute, Council for Scientific and Industrial Research (SRI) Water Research Institute, Council for Scientific and Industrial Research (WRI) Hydrological Service Department (HSD) Water Resources Commission (WRC) (2 participants) Hydrological Service Department (HSD) Ghana Irrigation Development Authority (GIDA) Water Research Institute, Council for Scientific and Industrial Research (WRI) Ghana Water Company Ltd, Head Office (GWCL) Dept. of Agriculture Economy & Agriculture Business. College of Agric. and Consumer service Environmental Protection Agency (EPA) Centre for Environmental Impacts Analysis (CEIA) Volta Basin Development Foundation (VBDF) Training, Research Network for Development (TREND) UDS: Faculty of Integrated Development Studies Savannah Agricultural Research Institute (SARI) Volta River Authority (VRA)

Page 5: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

5 Antonio Rogmann (ZEFc), Universität Bonn

Results

Lack of information about data = data about data = meta data

Data Management: Problems

Problems to get information about data concerning

0

1

2

3

4

5

6

7

8

9

10

Data Provider Data Quality DataContent/Format

Use Rights/prices Softw are data hasbeen processed

Vo

tes

based on survey squestionnaire.Number of participants = 19.

Page 6: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

6 Antonio Rogmann (ZEFc), Universität Bonn

Data Management: Problems

Documentation of data available by

0

1

2

3

4

5

6

7

Meta database(internel)

Digital Catalog Catalog on paper Withoutdocumentation

Vo

tes

Results

Documentation of data mainly on internal digital catalogues (e.g. Excel-Tables), on papers or completely without documentation

Web-based and searchable meta database as exception

based on survey squestionnaire.Number of participants = 19.

Page 7: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

7 Antonio Rogmann (ZEFc), Universität Bonn

Results

Data transfer is copious and time consuming Sending data by E-mail causes problems because of data volumes and transfer times

Data Management: Problems

Data transfer to partners by

0123456789

1011

Email CD /DVD sent bypost

Web dow nload picking uppersonally

Other w ays

Vo

tes

based on organizations represented in the questionnaire participants. Multiple choice. N = 19.

Page 8: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

8 Antonio Rogmann (ZEFc), Universität Bonn

Data user

Organization

data

Service Department

data

Institution

data

Data Management: Problems

data

Common questions encountered

when searching for data:

Which data exist that can serve my research / decision / information requirements?

Where are the data available?

How can I get the data with little effort?

What are the formats of the data? Are they compatible with my applications / models?

What are the data characteristics (e.g. time steps, units ...)?

Who owns the data? Are there costs?

?

Page 9: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

9 Antonio Rogmann (ZEFc), Universität Bonn

Organization (Hoster)

GVPdata

Service Department

data

Institution

data

??

To solve these problems the GVP would like to offer you:

a centraly hosted database which provides

access to the GVP datastock

the option to extent the datastock

with your own data

a centrally hosted metadatabase giving

information about data needed

references about data providers

a geo portal informing

about projects related to water management in the Volta-Basin

and their data: in a spatial visualization

Metadata

Geoportal Meta data

!!

Web

data

Data ServerMap Server

Solution: Data Management

Page 10: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

10 Antonio Rogmann (ZEFc), Universität Bonn

Data

Project data: what can GVP provide?

Hydrological data: water discharge, groundwater (time series) ...

Climatological data: precipitation, temperature, air humidity, evapotranspiration, heat flux (time series and forecasts) ...

Water use data: agricultural (irrigation) / domestic / industrial (hydropower) / reservoirs …

Land use / land cover data: agriculture, urbanization, soil, geology, vegetation ...

Topographic / infrastructure / administrative (basic) data: river networks, lakes, elevation, roads, settlements, electricity, boundaries ....

Socio–economic data: demography, census data, economic activities (markets), surveys ...

in several formats: vector / raster data (remote sensing), tables, documents, model specific formats ...

Page 11: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

11 Antonio Rogmann (ZEFc), Universität Bonn

Solution: Data Management

Data Management is the holistic background in which data access facilities are embedded

Data Management in an organization is based on a variety of methods for

Data description (meta data)

Data organization

Data quality assurance

Data access and distribution

Security

Page 12: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

12 Antonio Rogmann (ZEFc), Universität Bonn

Solution: Data Management

Data Management in an organization is practically based on

Standards

global standards e.g for metadata, resource identification, formats ...

internal standards according to a concensus inside the organization e.g.

database models, file naming, data policy ....

Workflows / Process Steps / Responsibilities

Technology: hardware, software, interfaces ... data infrastructure

Page 13: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

13 Antonio Rogmann (ZEFc), Universität Bonn

Data Management: Metadata

Metadata Standards:

several standards developed by standardization organizations like Federal Geographic Data Committee‘s (FGDC) standard ISO 15119 for geodata

registered by the International Organization for Standardization (IOS)

consisting of a range of elements/fields to describe resources (data, software, services)

some metadata standards partly consist of several hundreds of elements

Page 14: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

14 Antonio Rogmann (ZEFc), Universität Bonn

Data Management: Metadata

Metadata Standard in the GVP: Dublin Core (DCMES)

core of 15 elements, extended by some special elements for geodata

all elements, except titel and identifier, are optional

understandable element description

every kind of resource (data, software, model, …) can be described

Searchable Metadata elements like

„Subject“: topic will be categorized using keywords, key phrases, or classification codes

„Publisher“: an entity (institution, person) responsible for making the resources available

„Format“: the file format, physical medium, or dimensions of the resource

go to manual

Page 15: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

15 Antonio Rogmann (ZEFc), Universität Bonn

Data Management: Metadata

Creating metadata

Metadata should be stored in a metadatabase

hosted in a central place

providing web-based access and search interfaces to data and resource descriptions

Metadata can be created in two ways:

online: direct entry of metadata into a central metadatabase using internet browser, java script, php

offline: using an internet browser and a java script, storing each metadataset locally and close to the described object in a XML-file

if metadata XML-files have been created offline

► a metadata harvester can collect and insert local files automatically into the metadatabase on a server

► the metadata-files can be uploaded to the central metadatabase thango to manual

Page 16: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

16 Antonio Rogmann (ZEFc), Universität Bonn

obligatory metadata elements

metadata elements input

field

opens URN mask

Input Mask* for

creating metadata-sets as XML files

entering metadata in to metadatabase (web / LAN)

go to manual

* Developed as prototype by Dr. Marcel Endejan,Deputive Executive Officer, GWSP in Dissertation

Data Management: Metadata

Metadata input mask

Page 17: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

17 Antonio Rogmann (ZEFc), Universität Bonn

Data Management: Metadata

Metadata input mask

metadata elements

insert button go to manual

Page 18: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

18 Antonio Rogmann (ZEFc), Universität Bonn

Data Management: Internal Description

Internal file description for structured data (e.g. measurements):

data file-headers giving information about content, units in use, instrumentation, quality of values, location, ...

apart from metadata important information are provided to

the data user / recipient

multiple similar files / data sets can be described in

► the first sheet (e.g. of an excel file)

► the first file of a file set (referenced by others)

► separate text file stored close to the files

Page 19: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

19 Antonio Rogmann (ZEFc), Universität Bonn

Data Management: Naming Convention

Basic determination of data categories

qualitative data: data which are rich in detail and description, usually in a textual or narrative format, e.g. case studies, document reviews, …

quantitative data: numerical data. Data which are measured on either the ratio or interval scale of measurement, e.g. temperature, water level, …

Naming of data (recommended particulary for quantitative data) should reflect

(Example:) hyd_waterlevel_ghana-kaburi_020101-020630_v1.xls

Discipline Topic Site Time Frame version

Change of current naming systems is not necessary, but necessary is …

Page 20: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

20 Antonio Rogmann (ZEFc), Universität Bonn

Data Management: Identification

Identification of resources (data, documents, maps)

… an unique identifier for each resource as a central metadata element. GVP uses the Uniform Resource Name (URN)

quasi-standard for identifying resources in information systems. Example: International Standard Book Number (ISBN)

can be used as name for resources (e.g. file name)

has to follow an standardized syntax

URN in the GVP will be generated easily using a resource name generator (internet browser)

Page 21: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

21 Antonio Rogmann (ZEFc), Universität Bonn

Identification: URN

standardized syntax: ‚urn:‘<NID>‘:‘<NSS>

NID = Namespace Identifier representing an organisation, project, network, person

urn:x-gvp:uid:<NSS>

urn = uniform resource namex = experimental, not officially registered gvp = glowa volta project uid = user identification

Data Management: Identification

Page 22: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

22 Antonio Rogmann (ZEFc), Universität Bonn

standardized syntax: ‚urn:‘<NID>‘:‘<NSS>

NSS = Namespace Specific String encoding information about the „type“,„use“ and „storage medium“ of the resource / data

urn:<NID>:<resType>-<resSubType>.<sTitel>.v<verNr>.<for>.<med>

<resType> = type of resource, e.g. dataset, document, software<resSubType> = subtype o.r., e.g. primary / secondary data, model input<sTitel> = short titel, name <verNr> = versionsNumber<for> = format<med> = medium on which resource / data file is stored

Data Management: Identification

Page 23: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

23 Antonio Rogmann (ZEFc), Universität Bonn

Example:

urn:x-gvp:HD12:ds-pd.waterlevel_gh-kab_020101-020630-v1.0.xls.cd

gvp = Glowa Volta ProjectHD_12 = Institution e.g. „Hydro Service“, editor e.g. 12 = person xy ds = datasetpd = primary datawaterlevel ... 30 = short titel e.g. abbreviation for „hyd_waterlevel_ghana-

kaburi_020101-020630”v1.0 = version of dataset, e.g. raw data in 1st version (uncontrolled)xls = MS ExcelCD = on CD

Data Management: Identification

Page 24: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

24 Antonio Rogmann (ZEFc), Universität Bonn

Creating of URN‘sUsing the Resource Name Generator*

creates URN‘s using a simple Java Script application running in an internet explorer currently existing as prototype

* Developed as prototype by Dr. Marcel Endejan,Deputive Executive Officer, GWSP in Dissertation

Data Management: Identification

Page 25: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

25 Antonio Rogmann (ZEFc), Universität Bonn

Resource Name Generator integrates special codes for resource types within a network that shares data

the resource types have to be identified before modeling the URN-Syntax ….

… and integrated into the script

Resource Type

Resource Sub-Type

Data Management: Identification

Page 26: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

26 Antonio Rogmann (ZEFc), Universität Bonn

Resource Name Generator*

version, format and storage medium is selectable

copy and paste URN into the name of the dataset (if required) and enter it into the metadata

individually URN‘s will be adjusted to the central metadata base, in which the data will be registered and described

Version Number Format

Storage Medium

URN

to avoid duplicates

* Developed as prototype by Dr. Marcel Endejan,Deputive Executive Officer, GWSP in Dissertation

Data Management: Identification

Page 27: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

27 Antonio Rogmann (ZEFc), Universität Bonn

Formats

data can be stored in “proprietary” or in “non-proprietary” formats

proprietary format encodes data in a such a way, that the file can only be opened with the software which was used to generate the data

non-proprietary formats can be used by a wide range of applications (mostly using import functions) and platforms, increasingly in future

data has to be stored for a long period of time and it is not sure which programs will be used in future

interoperability between different software applications has to be provided as long as possible

Data Management: Formats

Page 28: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

28 Antonio Rogmann (ZEFc), Universität Bonn

international certified standards like the ISO standard “Open Document Format for Office Applications” (ODF), “HTML”, “XML” or OGC’s “GML” (Geographic Markup Language - Open Geospatial Consortium)

some formats are de facto-standards (like MS Excel) because the proprietary programs are applied by many users

processing software widely used by the members of a data exchange framework have requirements in respect to input formats

Data Management: Formats

Page 29: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

29 Antonio Rogmann (ZEFc), Universität Bonn

Conclusion: try to use non-proprietary exchange formats as far as possible and consider the format requirements of software in use

Examples:

Microsoft Word (.doc) Rich Text Format (.rtf), Open Document Text (.odt)

MS Excel (.xls) Comma Separate Value (.csv), Extensible Markup Language (.xml)

ESRI shape Geographic Markup Language (GML)

Recommendation:

use open office software like OpenOffice.org

in his functionalities similar to Microsoft Office (incl. Excel, Access, etc.)

format is ISO-Standard since 2006 (ODF - ISO/IEC 26300)!

no costs!

Data Management: Formats

Page 30: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

30 Antonio Rogmann (ZEFc), Universität Bonn

Security

Warranty to avoid unallowed access and missaplication of data and resources

Use of computing security facilities as

Authentification Control Lists (ACL)

secure access channels like Secure Shell (SSH) technology

Data Management: Security

Page 31: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

31 Antonio Rogmann (ZEFc), Universität Bonn

Data Access Control

data might have produced costs in creating, are not in the public domain, still not published, ….

data access control is based on an agreement within a (scientific) community of data producer, data user and data provider in terms of data access rules

Means:

who (user, user groups) is allowed to use (get) which data under which constraints (owner rights, payment)

how to organize the authentification prozess schematically user groups with graduaded access rights

how to implement the authentification process on a technical level

Data Management: Access Control

Page 32: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

32 Antonio Rogmann (ZEFc), Universität Bonn

Quality assurance for data

Data Quality means: the state of completeness, validity, consistency, timeliness and accuracy that makes data appropriate for a specific use using computing facilities

In a comprehensive view provided by data management as subordinate concept

Software-based methods linked with specific scientific disciplines

have to be transparent and comprehensible

should be declared (recommended) within a scientific or administrative network

level of quality must be described within a data file, within the metadata....

Data Management: Quality

Page 33: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

33 Antonio Rogmann (ZEFc), Universität Bonn

Quality assurance for data in the GVP

Is done by the scientists within their own discipline in their responsabilty

Test by diagramms, if data are consistent

Comparisons with other data sources

Routine recalibration of instruments

Program limit checks

Basic statistics

Data Management: Quality

Page 34: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

34 Antonio Rogmann (ZEFc), Universität Bonn

Getting benefits from data management requires the effort of all participants

DM needs firm agreements with regard to

standards

selected data user and their capabilities in accessing and using data

technical environment as software (-interfaces), network protocols, etc.

personal and / or institutional responsibilities within a ....

... data management workflow: data production quality control naming, identifying description transfer to data host delivery from data host

DM requires the willingness to invest time and to hold the standards!!

Data Management: Challenges

Page 35: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

35 Antonio Rogmann (ZEFc), Universität Bonn

Data Management Workflow

Next slides are part of an digital GVP-data-management-workflow manual and documentation

Will be completed and published at the beginning of 2008

Background for the next training sessions for web-based data management and (geo)database administration

Workflow Manual will be offered in a similar design but in different formats (PDF, HTML), thus it can be delivered or published within the web

It serves as a good practice in the GVP, but has to be extended for fitting further requirements to the system from stakeholders side - after the GVP!!

Page 36: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

36 Antonio Rogmann (ZEFc), Universität Bonn

Data Management Workflow

7 6

5

4

321

transfer

1workflow

steps (linked)

Page 37: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

37 Antonio Rogmann (ZEFc), Universität Bonn

Data Management: Workflow Steps

Step 1: Data Collection

Processes-survey-data logger download-surveying & mapping

Location-field-site

Processor-scientist -planner -data collector

Software / Interfaces-file explorer-download interface-GPS-Tracking-data processing

software-hardcopies

Hardware-Data Logger-Lap Top-GPS-...

Take notes in a log book about measurement device: name,

manufacturer, serial number date: when has the data been collected name of the person who collects the data

in the field what has been done: maintenance,

calibration particularities: could anything special be

observed?

GPS measurements and mappings choose the appropriate Coordinate

System for the spatial working area for Ghana Coordinate System WGS1984

projected in UTM (Zones 30/31N), (Burkina Faso 30/31P)

Recommendations ...in note form

Back to overview

Page 38: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

38 Antonio Rogmann (ZEFc), Universität Bonn

Data Management workflow steps

Step 2: Quality Control

Processes-searching for gaps,

outliers, file damages -deleting data errors-filling gaps-documenting

Location-field-site-office

Processor-scientist -data collector

Software / Interfaces-statistical methods

(algorithm)-data processing

programs (e.g. HYDAT)

Hardware-Lap Top-PC-workstation

Data Quality Assurance Julia, Uli bsphft. methods

Documentation which uncertainties are still given what was done for quality control specific algorithms and software

used note it in the meta data note it in table headers

Recommendations ...in note form more to this topic

Back to overview

Precipitation Sept. 05 - June 06

0

50

100

150

200

250m

m

Page 39: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

39 Antonio Rogmann (ZEFc), Universität Bonn

Data Management Workflow Steps

Step 3: Naming, URN

Processes-designing of an

appropriate name syntax-naming of resources-crating of URNs

Location-office

Processor-scientists -planners-database administrator

Software / Interfaces-file explorer-Internet Explorer -html, Java Script

Hardware-Lap Top-PC-workstation

data name reflecting topic of content spatial and temporal coverage status of processing (version)

local data sharing (e.g. office with network)

find an agreement about file name syntax GVP-Standard?

identify resource / data types to define an URN Syntax GVP-Standard?

assign an Uniform Resource Name use the Resource Name Generator store URN within the data sets store URN in local data catalogue store URN creating metadata

in note form more to this topicto consider ….

Back to overview

Page 40: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

40 Antonio Rogmann (ZEFc), Universität Bonn

Data Management Workflow Steps

Step 4: organization of data

Processes-designing of an

appropriate storage structure (directories) on

file system

Location-office

Processor-scientists -planners-network administrator

Software / Interfaces-file explorer / manager

Hardware-Lap Top-PC-LAN (Server)

Directory Structure

especially important►when data or resources are shared within an office community ►within small Local Area Networks (LAN)►within peer-to-peer network

can be concepted focussing on►data processing framework (models etc.)►project structure (subprojects project hierarchy)►spatial, temporal or thematic content of datastock (e.g. by

regions, themes..)

should be matched on local drives by all participants of the network - adjusted to personal focal points in work

makes easier to find resources

Recommendations ...in note form more to this topic

Back to overview

Page 41: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

41 Antonio Rogmann (ZEFc), Universität Bonn

Data Management Workflow Steps

Step 4: organization of data

Processes-insert information about

data into a data dictionary

Location-field-site-office

Processor-scientist -planner

Software / Interfaces-Excel -OpenOffice Calc

Hardware-Lap Top-PC-workstation

Data Catalogue a small table file with

registration of own data, scripts, etc. on local drives

provides overview and saves time

minimum elements should be:

►Uniform Ressource Name (URN)

►Titel / Name►Short Description►Format►Storage Location

(path)

Example from GVP

Recommendations ...in note form more to this topic

Back to overview

Page 42: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

42 Antonio Rogmann (ZEFc), Universität Bonn

Data Management Workflow Steps

Step 4: organization of data

Processes-insert dataset

information directly into or closely to the file

Location-field-site-office

Processor-scientist -planner

Software / Interfaces-processing software-file explorer

Hardware-Lap Top-PC-workstation

table header with details to Unified Resource Name: [‚urn:‘<NID>‘:‘<NSS>] Data provided by: [surname, first name, email-address, institution] Location: [name of location, UTM coordinates (X,Y)] Elevation: [m above sea level] Measuring Design: [description of applied methods] Measurement Executer: [name, (project, institution)] Measuring period: [JJJJMMDD – JJJJMMDD, time steps (d/h/s,

Minutes)] Missing values: [-9999.9] Quality: [description of quality assurance methods] Notes: [remark]

table header with description of parameters in use explain the meanings of abbreviations / codes declare the units used within the parameters if not self-explaining

use informations from data collection log book

Recommendations ...in note form more to this topic

red font = metadata elements (if metadata file just created this ones are not necessary as table header!)

Back to overview

Page 43: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

43 Antonio Rogmann (ZEFc), Universität Bonn

Data Management Workflow Steps

data file header: example more to this topic

Back to overview

Page 44: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

44 Antonio Rogmann (ZEFc), Universität Bonn

Data Management workflow steps

Step 5: description, create metadata

Processes-description of data /

resources following metadata standard

Location-office

Processor-data producer

Software / Interfaces-internet browser -html, java script

Hardware-Lap Top-PC-workstation

Metadata at latest if data is going to be published, it should be described by

entering metadata use the internet browser interface (as described here) for entering

metadata try to fill out as much elements as possible the accurate use of keywords in element “subject and keywords”

is very important most queries to metadata address “subject and keywords” as well

as “spatial coverage”

to do …in note form more to this topic

Back to overview

Page 45: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

45 Antonio Rogmann (ZEFc), Universität Bonn

Data Management Workflow Steps

Step 5: create metadata

Processes-description of data /

resources following metadata standard

Location-office

Processor-data producer

Software / Interfaces-internet browser -html, java script

Hardware-PC-workstation

in note form more to this topicto consider ….

Metadata don’t forget to give access information about the data / resource

►current location: where the resource can be retrieved►access modalities (costs, user rights, technical way of

retrieving, etc.)►if data are not transmitted to central host: local contact person

Metadata storage files

if direct input to metadata base is not possible (no internet connection): XML-metadata files are to be sent to the administrator of the central metadatabase e.g. on CD by postal service

Data and metadata

metadata only have to be created, if the further use of resources by others is due

Back to overview

Page 46: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

46 Antonio Rogmann (ZEFc), Universität Bonn

Data Management Workflow Steps

Step 6: (preparing to) transfer

Processes-decision making to

publishing of data access constraints

(user) transmission to

central database

Location-collective institution-local offices

Processor-data user framework-database administrators

Software / Interfaces

Hardware

in note form more to this topicto do ….

Make a decision

if datasets or resources (e.g. software, models) should be shared

who - persons, institutions, partners - should have access to the data

if there should be a payment for data sets

where the accessible datasets should be stored: locally or on a central server

who is the responsible person controlling the transmission to a central database. This person in charge has to control if

►the resources/datasets meet the data management standard of the community

►particularly if the data have proper metadata including clear definition of use rights ( provide database administrator with a list of potential user groups)

Back to overview

Page 47: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

47 Antonio Rogmann (ZEFc), Universität Bonn

Step 7: transfer

Processes-formatting-upload to central

database

Location-local office-central database host

Processor-data producer-central database

administrator

Software / Interfaces-data processing software-html, java script-SSH (e.g. winscp)

Hardware-PC-Server

in note form more to this topicto do ….

Preparing the transfer

reformat the data sets, if required

inform central database administrator►which datasets are going to be uploaded to the central

database and why ►that metadata are entered directly into the metadatabase

using the web interface ►that metadata files are transmitted together with the datasets

Do the transfer

upload the data to a “transfer” directory on the main server

use upload software based on ftp (file transfer protocol) or SFTP (Secure Shell - File Transfer Protocol) if facilities are given

GVP uses SFTP for data transferring to the Data-Server

if upload is not possible because of slow internet connection, send data by postal service on CD / DVD

Data Management Workflow Steps

Back to overview

Page 48: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

48 Antonio Rogmann (ZEFc), Universität Bonn

InternetzoneIntranetzone

Datenserver (+RAID)Webserver (VM)

File System(Samba)

ESRI-Geodata-

base

MySQL/Postgres:

Meta-DB

Portal-DB

GLOWAVolta HP

SMB

ESRI ArcGISClients

Mapbenderinkl. PostgreSQL

MapServer

Apache

Catalog-Managerinkl. phpMyAdmin

PHP(CGI)

CGI

PHP, DOM

Portal

TomcatJSP/ Java Java-based Client

(COBIDS)

SMB,JDBC

JDBC:1521

ArcGIS Client

ADODB:1521

SMB

SMB

File

JavaScript

lokal/offline

Meta.dc.xml

describes

Metadata-Interface

xml/xsl

request to download

GVP-Data Infrastructure

Page 49: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

49 Antonio Rogmann (ZEFc), Universität Bonn

Don‘t feel shocked, that‘s technical stuff, let‘s look at it from the user‘s side

GVP-Data Infrastructure

Page 50: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

50 Antonio Rogmann (ZEFc), Universität Bonn

Harvesting the fruits:

a centrally hosted database

giving access to the GVP datastock

with the option to extend the datastock with your own data

a central hosted metadatabase giving

answers about data needed

references about data providers

a geoportal informing

about projects related to water management in the Volta Basin

and their data: in a spatial visualization

Organization (Hoster)

GVPdata

Service Department

data

Institution

data

Data user

Metadata

Geoportal Meta data

!!

Web

data

Data ServerMap Server

GVP-Data Infrastructure

Page 51: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

51 Antonio Rogmann (ZEFc), Universität Bonn

GVPdata

Service Department

data

Institution

data

Data user

Metadata

Geoportal Meta data

Web

data

Data Server

Map Server

userinterfaces

databases

View at the background:

user interfaces

Geoportal

Internet explorer with database interfaces

databases

Portal-Database

Metadatabase

File-System

Geodatabase

web technologies

not today‘s topic!

GVP-Data Infrastructure

Page 52: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

52 Antonio Rogmann (ZEFc), Universität Bonn

1. Approach to get data

Via Geoportal

1. Geoportal searches in metadatabase

2. Catalogue Manager software on server provides result-list

3. found geodata can be requested as ...

4. ... interactive maps, provided to internet browser as WebMapService generated by UMN-MapServer

5. or for download original data files (also other data) if allowed

Metadata

Data Server

GVPdata

1

2

ACL-List

4

3

5

Data Infrastructure: Web-User‘s view

Page 53: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

53 Antonio Rogmann (ZEFc), Universität Bonn

2. Approach to get data

Via Resource / Data Catalogue

1. Internet Browser Interface (on homepage) searches in metadatabase

2. Catalogue Manager software on server provides result-list

3. found geodata can be requested as

4. ... interactive maps, provided to internet browser as WebMapService generated by UMN-MapServer

5. or for download original data files (also other data) if allowed

Metadata

Data Server

GVPdata

1

2

ACL-List

4

3

5

Data Infrastructure: Web-User‘s View

Page 54: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

54 Antonio Rogmann (ZEFc), Universität Bonn

Data Infrastructure: Geoportal

Layer

(selected) Feature Info:attribute table with properties, and links to documents/ grafics/web-adresses

Map Tools: zoom, pan, select

Overview

Geoportal-Components

WebMapService is OGC-Standard

Mapbender free client software for mapserver (server- side)

UMN Mapserver free and widely used mapserver software

Geoportal Interface Software currently under development by J. Laubach (Institute for Computer Science III, University of Bonn)

Page 55: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

55 Antonio Rogmann (ZEFc), Universität Bonn

Data Infrastructure: Interfaces

Data Server(Linux)

Geodata-Base

(ESRI)

File System(Samba)

web intranet

GIS-Client

Internet Browser

File Explorer

SSH Client(only authorized direct access)

Page 56: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

56 Antonio Rogmann (ZEFc), Universität Bonn

GVP Data Infrastructure

intranetview on data server file system

User Group 1

User Group 2

Data user

User System

Data user

Data user

Data user

Page 57: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

57 Antonio Rogmann (ZEFc), Universität Bonn

GVP Data Infrastructure: Geodatabase

„Geodatabase“ = Geodata-base Format from ESRI

Relational Database all data ( entities = objects = layer)

organized in tables tables can be related to each other

► by using keys► based on cardinality (1:1, 1:n, m:n

relationships)

Common GIS-Formats (shape, ArcInfo coverage, ...) are

organized in several single files representing an object class

► for geometry► for attributes► for linkage geometry <--> attributes ► etc...

Page 58: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

58 Antonio Rogmann (ZEFc), Universität Bonn

A relational database is managed by a database management system e.g. MS Access, DB2, Oracle, MySQL

An ESRI-Geodatabase is managed by the ArcGIS application „ArcCatalog“

GVP Data Infrastructure: Geodatabase

Page 59: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

59 Antonio Rogmann (ZEFc), Universität Bonn

A geodatabase provides comprehensive facilities for

storage of a collection of different geodata types on a central place

application of sophisticated relationships and rules to the data

modeling of complex spatial behaviour (topology, geometric networks ..)

maintaining of data integrity

easy scaling of the data storage

defining custom objects

GVP Data Infrastructure: Geodatabase

Page 60: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

60 Antonio Rogmann (ZEFc), Universität Bonn

(Relational) Geodatabase in the GVP

only prototypes

format still not used in the GVP

establishing of a geodatabase in the GVP still under discussion

advantages of geodatabase

well organized geodata

high information level (modeling)

database facilities for warranty of data integrity (-quality)

disadvantages of geodatabase

licences for upgrading ArcView Client Software (costs)

high effort for creating the database

GVP Data Infrastructure: Geodatabase

Page 61: 1 Antonio Rogmann (ZEFc), Universität Bonn Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural

61 Antonio Rogmann (ZEFc), Universität Bonn

Alternatives for geodata storages in the GVP

Geodata in common GIS-Formats (shape, ...) within an File-System, as well as now

Open Source (free) Geodatabases (PostgreSQL/PostGIS)

no licence costs for installing, but for support

not easy to administrate

bad connections between ESRI and Postgres

but use of open source and free geodatabases should be considered in the further developments of

the GLOWA-Volta-Project and his partners

GVP Data Infrastructure: Geodatabase