19
Long-term preservation of digital geospatial data: challenges for ensuring access and encouraging reuse Anne Robertson, EDINA & Steve Morris, NCSU Libraries EDINA National Data Centre University of Edinburgh North Carolina State University Libraries NCGDAP Architecture Working Group OGC TC/PC Meeting Bonn, 9th November 2005

Long-term preservation of digital geospatial data: challenges for ensuring access and encouraging reuse Anne Robertson, EDINA & Steve Morris, NCSU Libraries

Embed Size (px)

Citation preview

Page 1: Long-term preservation of digital geospatial data: challenges for ensuring access and encouraging reuse Anne Robertson, EDINA & Steve Morris, NCSU Libraries

Long-term preservation of digital geospatial data: challenges for ensuring

access and encouraging reuse

Anne Robertson, EDINA & Steve Morris, NCSU LibrariesEDINA National Data Centre

University of Edinburgh

North Carolina State University LibrariesNCGDAP

Architecture Working Group OGC TC/PC Meeting

Bonn, 9th November 2005

Page 2: Long-term preservation of digital geospatial data: challenges for ensuring access and encouraging reuse Anne Robertson, EDINA & Steve Morris, NCSU Libraries

Objectives

Why we’re here………………

• Introduce preservation and access use cases to OGC

• Find points of intersection with OGC initiatives

• Flesh out research agenda for preservation of

geospatial digital data

• “Permanent access and reuse” not just

preservation

Page 3: Long-term preservation of digital geospatial data: challenges for ensuring access and encouraging reuse Anne Robertson, EDINA & Steve Morris, NCSU Libraries

North Carolina Preservation Partners

• North Carolina State University Libraries– University-wide GIS services since 1992– New focus on publishing WMS services for use by

external clients or service aggregators– Archiving local agency geospatial data since 2000

• NC Center for Geographic Information & Analysis– State government GIS agency– Maintains state’s Corporate Geographic Database– Coordinates many SDI initiatives, including NC OneMap

• NC OneMap– Seamless access to local, state, and federal data;

component part of National Map– WMS services available individually from sources or

through aggregator viewer– Focus on standards, best practices, data sharing

agreements, inventories, and metadata outreach

Page 4: Long-term preservation of digital geospatial data: challenges for ensuring access and encouraging reuse Anne Robertson, EDINA & Steve Morris, NCSU Libraries

NC Geospatial Data Archiving Project

• Cooperative project with Library of Congress under the National Digital Information Infrastructure and Preservation Program (NDIIPP)– One of 8 NDIIPP partnership projects, others focusing on

web pages, numeric data, video, business records, etc.– Focus on developing a network of partners, identifying

preservation issues in various domain areas

• NCGDAP: 3 year project focused on preservation of state and local agency digital geospatial data– Identify and acquire data– Develop digital repository; ingest and manage content

• Objective: engage existing spatial data infrastructures in process of data preservation

Page 5: Long-term preservation of digital geospatial data: challenges for ensuring access and encouraging reuse Anne Robertson, EDINA & Steve Morris, NCSU Libraries

NCGDAP Project Phases

• Content Identification and Selection– Work from existing inventory processes– Select from among “early”, “middle”, and “late” stage

information products• Content Acquisition

– Acquire state and local agency content– Investigate methods of automating archive development

• Partnership Building– Work within NC OneMap framework (infrastructure)– Several other emerging geo-preservation projects

• Content Retention and Transfer– Metadata and ingest workflow– Emphasis on repository-agnostic approach, avoid

“imprinting” one environment– Initially using DSpace open source software, re-ingest

into a different environment later

Page 6: Long-term preservation of digital geospatial data: challenges for ensuring access and encouraging reuse Anne Robertson, EDINA & Steve Morris, NCSU Libraries

Common Themes – Cartographic Representation

• The counterpart to the map is not just the dataset but also models, symbology, interpretation. These key elements give real meaning – how are these captured for reuse?

Page 7: Long-term preservation of digital geospatial data: challenges for ensuring access and encouraging reuse Anne Robertson, EDINA & Steve Morris, NCSU Libraries

Common Themes – GML for archiving?

• Interest in alternative to proprietary vector file formats• “Permanent access” requirements:

– profiles and application schemas widely understood and supported, avoid requiring “digital archaeology”

– Role of GML Simple Features Specification?• Assessing formats for preservation: sustainability factors,

quality & functionality factors• Planned environmental scan of existing GML profiles and

application schemas– Collaboration with National Archives and Records Administration

and FGDC Historical Data Working Group – Vendor support? Official status? Stability over time?

• How to handle proprietary formats?– UC Santa Barbara/Stanford NDIIPP project working on format

registry– Spatial databases pose special challenges

Page 8: Long-term preservation of digital geospatial data: challenges for ensuring access and encouraging reuse Anne Robertson, EDINA & Steve Morris, NCSU Libraries

Common Themes – Content replication

• Need efficient means to replicate content to archive– North Carolina: 100 counties and 140 municipalities

• Content replication also needed for:– Disaster preparedness– State and federal data improvement projects– Aggregation by regional geospatial web service providers

• WFS, e.g.: efficiency in complete content transfer?• Rsync-like function, plus: rights management, inventory

processes, metadata management, informed by data update cycles

• Archiving delta files vs. complete replication – need to avoid requiring “digital archaeology” in the future

• Other models: LOCKSS (Lots of Copies Keeps Stuff Safe)

Page 9: Long-term preservation of digital geospatial data: challenges for ensuring access and encouraging reuse Anne Robertson, EDINA & Steve Morris, NCSU Libraries

Common Themes – Time versioning

• How to manage datasets that change over time?– Versions will live in different repositories, must handle relationships

outside of the individual repository

• Industry focus on most current data … but increased demand for temporal data– e.g., land use change detection, business trends analysis– Much older data lost -- “Digital dark age”

• Draft NCGDAP approach: manage information for “serial objects” separately, link to serial entity via persistent identifier (Handle)– Support “get current data/metadata/DRM” operations– Avoid managing volatile information (e.g., service connections) in

individual static metadata records– Other technologies: OpenURL for service connections?

Page 10: Long-term preservation of digital geospatial data: challenges for ensuring access and encouraging reuse Anne Robertson, EDINA & Steve Morris, NCSU Libraries

EDINA• A National Data Centre for Tertiary Education since 1995

– based at the University of Edinburgh Data Library

• Our mission... to enhance the productivity of research, learning and teaching in

UK higher and further education • GeoServices team - provide SDI components to UK

academic sector• Substantial experience in handling and delivering key

geospatial data and geo-referenced information• OGC members since 1999• Strategic move toward interoperability & shared services

role – use of OGC interface specifications in our projects and services

Page 11: Long-term preservation of digital geospatial data: challenges for ensuring access and encouraging reuse Anne Robertson, EDINA & Steve Morris, NCSU Libraries

GRADE project introductionAccording to OECD Follow up Group on Issues of Access to Publicly Funded Research Data1 …

“More widespread and efficient access to and sharing of research data will have substantial benefits for most areas of scientific research.”

Evidence of re-use of data within UK data centres is low:

– “Level of re-use of data held in the AHDS and ESRC archives has been disappointingly low” (Alison Allden, 2003)

– “NERC spends about £5 million per annum on data management, but unclear what benefit it derives from this. More research is needed to establish benefits and value of data re-use” (Mark Thorley, 2003)

– Qualidata survey of qualitative data re-use (2000). 44% respondents used colleague's data rather than acquiring archived data via a dissemination service (33%)

1 Interim Report, 20 October 2002

Page 12: Long-term preservation of digital geospatial data: challenges for ensuring access and encouraging reuse Anne Robertson, EDINA & Steve Morris, NCSU Libraries

GRADE project introduction• Within UK academia there is a focus on the potential use of

digital repositories to assist with a variety of facets of digital asset management including encouraging reuse of research data

• GRADE will investigate and report on the technical and cultural issues around the reuse of geospatial data within the context of discipline-based repositories

• Particular focus on sharing and reuse of derived geospatial data

• EDINA leading GRADE with consortium partners:– AHRC Research Centre for Studies in Intellectual Property and

Technology Law, School of Law, Edinburgh University– National Oceanography Centre, Southampton University

– Variety of other associate partners including NCGDAP, British Atmospheric Data Centre, Ordnance Survey

Page 13: Long-term preservation of digital geospatial data: challenges for ensuring access and encouraging reuse Anne Robertson, EDINA & Steve Morris, NCSU Libraries

Common Themes – Digital Rights

• UK environment, a complex one– dominant provider of base vector geospatial data provider– array of space borne survey data available, much free for non-

commercial use– Stakeholder interest from research funders (research councils) and

research hosts (institutions)

• When we consider the reuse of derived geospatial data concerns over data ownership, IPR and copyright often suppress any initial enthusiasm

• We can offer the geoDRM discussion real scenarios of– IPR issues for derived geospatial data and– Geospatial data reuse/sharing use cases

Page 14: Long-term preservation of digital geospatial data: challenges for ensuring access and encouraging reuse Anne Robertson, EDINA & Steve Morris, NCSU Libraries

Derived Data ExampleOS Landline

Digitise coastline positions

Input

Processing

Processing

Output

ESRI Shapefile and tables of retreat

Ground surveyHistoric OS Maps

2001 Orthophotos

Scan Scan

Geo-reference Geo-reference

Accuracy assessment

Planimetric correction

GPS survey

Calculation of cliff retreat

Source: Use case provision of derivedgeospatial data as part of the GRADE project

in scoping digital repositories (draft report)

Page 15: Long-term preservation of digital geospatial data: challenges for ensuring access and encouraging reuse Anne Robertson, EDINA & Steve Morris, NCSU Libraries

Common Themes – Content Packaging

• Consider a geospatial data asset deposited into a repository, it’s more than one file:– GML and associated schema!– proprietary vector format plus cartographic representation detail– geodatabase– raster with header file– Data set metadata and IPR info

• What is best method to package data?• In eLibrary world the Metadata Encoding and

Transmission Standard (METS) and IMS content package (IMS CP) and MPEG-21 DIDL for repository objects

• “Interoperable repositories need to encode, exchange and describe complex objects in agreed ways”

• What direction is the GI industry taking with content packaging?

Page 16: Long-term preservation of digital geospatial data: challenges for ensuring access and encouraging reuse Anne Robertson, EDINA & Steve Morris, NCSU Libraries

Common Themes – Persistent Identifiers

• Once a geospatial data asset is deposited within a repository, there is a need to be able to persistently identify this asset

• Particular repository softwares use particular schemes e.g. Fedora uses ‘info’ URI scheme

• Requirement to ensure identifier is actionable

• We are thinking about OpenURL Resolvers and perhaps Digital Object Identifier (DOI) for handle schemes

• What direction is GI industry taking with persistent identifiers?

Page 17: Long-term preservation of digital geospatial data: challenges for ensuring access and encouraging reuse Anne Robertson, EDINA & Steve Morris, NCSU Libraries

Common Themes – ‘data plus services’ model

National Library of New Zealandhttp://wiki.tertiary.govt.nz/static/wikifarm/InstitutionalRepositories.uploads/Main/IR_report.pdf

Page 18: Long-term preservation of digital geospatial data: challenges for ensuring access and encouraging reuse Anne Robertson, EDINA & Steve Morris, NCSU Libraries

Conclusions

• Aim is to flesh out research agenda• Presented 7 common themes from our work• Shift to web services consumption poses threat to

secondary archive development … but can geospatial web services be put to use in preservation processes?

• Encourage GI community to connect with these issues or outcome may be that archive community will fail to take account of OGC work

• Where to from here?

Page 19: Long-term preservation of digital geospatial data: challenges for ensuring access and encouraging reuse Anne Robertson, EDINA & Steve Morris, NCSU Libraries

Contact details

Anne RobertsonGRADE Project ManagerEdina National Data [email protected] web site: http://edina.ac.uk/projects/grade

Steve MorrisHead of Digital Library InitiativesNorth Carolina State University Libraries

[email protected] NCGDAP web site: http://www.lib.ncsu.edu/ncgdap/

Questions?