58
Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University Libraries NARA Meeting Dec. 14, 2005

Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Embed Size (px)

Citation preview

Page 1: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Preservation of Digital Geospatial Data: Challenges and Opportunities Steve MorrisHead of Digital Library InitaitivesNorth Carolina State University Libraries

NARA Meeting Dec. 14, 2005

Page 2: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 2

Outline

Digital Geospatial Data: TypesRisks to Digital Geospatial DataOverview of NC Geospatial Data Archiving ProjectPreservation Challenges and Possible Solutions

Page 3: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 3

Geospatial data types: Vector data

Page 4: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 4

Geospatial data types: Satellite imagery

Page 5: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 5

Geospatial data types: Aerial imagery

Page 6: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 6

Geospatial data types: Aerial imagery

Page 7: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 7

Geospatial data types: Aerial imagery

Page 8: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 8

Geospatial data types: Tabular data (w/vector)

Page 9: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 9

Time series – vector dataParcel Boundary Changes 2001-2004, North Raleigh, NC

Page 10: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 10

Time series – Ortho imageryVicinity of Raleigh-Durham International Airport 1993-2002

Page 11: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 11

Today’s geospatial data as tomorrow’s cultural heritage

Page 12: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 12

Risks to Digital Geospatial Data

.shp

.mif

.gml

.e00

.dwg

.dgn

.bsb

.bil

.sid

Page 13: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 13

Risks to Digital Geospatial Data

Producer focus on current dataTime-versioned content generally not archives

Future support of data formats in questionVast range of data formats in use--complex

Shift to “streaming data” for accessArchives have been a by-product of providing access

Preservation metadata requirementsDescriptive, administrative, technical, DRM

GeodatabasesComplex functionality

Page 14: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 14

NC Geospatial Data Archiving Project

Partnership between university library (NCSU) and state agency (NCCGIA)Focus on state and local geospatial content in North Carolina (state demonstration)Tied to NC OneMap initiative, which provides for seamless access to data, metadata, and inventory informationObjective: engage existing state/federal geospatial data infrastructures in preservation

Page 15: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 15

Targeted Content

Resource TypesGIS “vector” (point/line/polygon) dataDigital orthophotography Digital mapsTabular data (e.g. assessment data)

Content ProducersMostly state, local, regional agenciesSome university, not-for-profit, commercialSelected local federal projects

Page 16: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 16

Local Government GIS: Archival Issues

Data resources are highly distributed and subject to frequent updateMore detailed, current, accurate than federal/state data resourcesNorth Carolina local agency GIS environment

100 counties, 95 with GIS85 counties with high resolution orthophotographyGrowing number of municipal systemsValue: $162 million plus investment (est. in 2003)

Page 17: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 17

Work plan in a Nutshell

Work from existing data inventories

NC OneMap Data Sharing Agreements as the “blanket”, individual agreements as the “quilt”

Partnership: work with existing geospatial data infrastructures (state and federal)

Technical approachMETS with FGDC, PREMIS?, GeoDRM?

Dspace now; re-ingest to different environment

Web services consumption for archival development

Page 18: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 18

NCGDAP Philosphy of Engagement

Take the dataas in the mannerIn which it can be obtained

Provide feedback to producer organizations/inform state geospatial infrastructure

“Wrangle”and archivedata

Note the ‘Project’ in ‘North Carolina Geospatial Data ArchivingProject’– the process, the learning experience, and the engagementwith geospatial data infrastructures are more important than the archive

Page 19: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 19

Big Challenges

Format migration paths

Management of data versions over time

Preservation metadata

Harnessing geospatial web services

Preserving cartographic representation

Keeping content repository-agnostic

Preserving geodatabases

More …

Page 20: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 20

Vector Data Format Issues

Vector data much more complicated than image data

‘Archiving’ vs. ‘Permanent access’An ‘open’ pile of XML might make an archive, but if using it requires a team of programmers to do digital archaeology then it does not provide permanent access

Piles of XML need to be widely understood piles

GML: need widely accepted application schemas (like OSMM?)

The Geodatabase conundrumExport feature classes, and lose topology, annotation, relationships, etc.

… or use the Geodatabase as the primary archival platform (some are now thinking this way)

Page 21: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 21

GIS Software Used: NC Local Agencies

0%

10%

20%

30%

40%

50%

60%

70%

ArcGIS (ESRI) ArcInfo (ESRI) ArcView 8.x (ESRI)

ArcView 3.x (ESRI) ArcIMS (ESRI) GenaMap

IMAGINE Intergraph MapInfo

Understanding Systems Other Not Sure

Source: NC OneMap Data Inventory 2004

Page 22: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 22

Vector Data Format OptionsOption A: use an open format and have a really unfortunate transformation and limited vendor support for the output objectOption B: use closed format but retain the original content and count on short- and medium-term vendor support. Option C: do both to buy time and look for an open, ASCII-based solution. (watch GML activity)

No sweet spot, just an evolving and changing mix offlawed options that are used in combination.

Page 23: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 23

Geography Markup Language Issues

GML still more useful as a transfer format than an archival format, support limited even for transfer“Permanent access” requirements:

profiles and application schemas widely understood and supported, avoid requiring “digital archaeology”role of GML Simple Features Profile?

Assessing formats for preservation: sustainability factors, quality & functionality factors

Apply same approach to GML profiles and application schemas?

Page 24: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 24

Geography Markup Language Issues

Plans for environmental scan of existing GML profiles and application schemas or profiles

schema name (e.g. OSMM, top10NL, ESRI GML, LandGML)responsible agency; schema has official government status?GML version; known unsupported GML componentsschema history; known interoperation with other schemas vendor support; translator support; stability over time

Page 25: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 25

Managing Time-versioned Content

Page 26: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 26

Managing Time-versioned Content

Many local agency data layers continuously updated

E.g., some county cadastral data updated daily—older versions not generally available

Individual versioned datasets will wander off from the archive

How do users “get current metadata/DRM/object” from a versioned dataset found “in the wild”?

How do we certify concurrency and agreement between the metadata and the data?

Page 27: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 27

Managing Time-versioned Content

Can we manage the relationship loosely using a persistent identifier link to a parent object?

version

version version

version

Persistent IDResolver

Parent ObjectManager

version

Page 28: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 28

Preservation Metadata Issues

FGDC MetadataMany flavors, incoming metadata needs processing

Cross-walk elements to PREMIS, MODS?

Metadata wrapper/Content packagingMETS (Metadata Encoding and Transmission Standard) vs. other industry solutions

Need a geospatial industry solution for the ‘METS-like problem’

GeoDRM a likely trigger—wrapper to enforce licensing (MPEG 21 references in OGIS Web Services 3)

Page 29: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 29

Metadata Availability

Page 30: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 30

Harnessing Geospatial Web Services

Page 31: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 31

Page 32: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 32

Page 33: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 33

Page 34: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 34

Page 35: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 35

Page 36: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 36

Geospatial Web Service Types

Image servicesDeliver image resulting from query against underlying dataLimited opportunity for analysis

Feature servicesStream actual feature data, greater opportunity for data analysis

OtherGeocoding servicesRouting.etc.

Page 37: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 37

Page 38: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 38

Accessible ArcXML Services

Geospatial Web Services Rights IssuesExample: Desktop GIS-accessible ArcIMS39 of 100 NC counties have desktop GIS-accessible

ArcIMS servicesIt is difficult to know how many of these counties actually expect users to either:

A) access data through desktop GIS for viewing only, orB) extract and download data

Page 39: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 39

Harnessing Geospatial Web Services

Automated content identification ‘capabilities files,’ registries, catalog services

WMS (Web Map Service) for batch extraction of image atlases

last ditch capture option

preserve cartographic representation

retain records of decision-making process

… feature services (WFS) later.

Rights issues in the web services space are ambiguous

Page 40: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 40

“Web mash-ups” and the New Mainstream Geospatial Web Services

Page 41: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 41

Preserving Cartographic Representation

Page 42: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 42

Preserving Cartographic Representation

The true counterpart of the old map is not the GIS dataset, but rather the cartographic representation that builds on that data:

Intellectual choices about symbolization, layer combinations

Data models, analysis, annotations

Cartographic representation typically encoded in proprietary files (.avl, .lyr, .apr, .mxd) that do not lend themselves well to migration

Symbologies have meaning to particular communities at particular points in time, preserving information about symbol sets and their meaning is a different problem

Page 43: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 43

Preserving Cartographic Representation

Image-based approachesGenerate images using Map Book or similar tools

Harvest existing atlas images

Capture atlases from WMS servers

Export ‘layouts’ or ‘maps’ to image

Vector-based approachesStore explicitly in the data format (e.g. Feature Class Representation in ArcGIS 9.2)

Archive and upward-migrate existing files .avl, .apr, .lyr, .mxd, etc.

SVG, VML or other XML approaches

Other?

Page 44: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 44

Preserving Cartographic Representation

Page 45: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 45

Preserving Cartographic Representation

Page 46: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 46

Interest in how geospatial content interacts with widely available digital repository software

Focus on salient, domain-specific issues

Challenge: remain repository agnosticAvoid “imprinting” on repository software environment

Preservation package should not be the same as the ingest object of the first environment

Tension between exploiting repository software features vs. becoming software dependent

Repository Architecture Issues

Page 47: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 47

Preserving Geodatabases

Spatial databases in general vs. ESRI Geodatabase “format”Not just data layers and attributes—also topology, annotation, relationships, behaviorsESRI Geodatabase archival issues

XML Export, Geodatabase History, File Geodatabase, Geodatabase Replication

Some looking to Geodatabase as archival platform (in addition to feature class export)

Page 48: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 48

Geodatabase Availability

Local agencies, especially municipalities, are increasingly turning to the ESRI Geodatabase format to manage geospatial data. According to the 2003 Local Government GIS Data Inventory, 10.0% of all county framework data and 32.7% of all municipal framework data were managed in that format.

Cities: Street Centerline Formats

Geodatabase

Shapefile

Coverage

Other

Counties: Street Centerline Formats

Geodatabase

Shapefile

Coverage

Other

Page 49: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 49

Evolving Geodatabase Handling Approaches

Project Stage Planned Approach

Original Proposal (Nov. 2003)

Export feature classes as shapefiles; archive Geodatabases less than 2 GB in size

Finalized Work Plan (Dec. 2004)

Also export content as Geodatabase XML

Possible Future Work Plan Changes

Explore maintenance of some archival content in Geodatabase form; explore Geodatabase replication as an archive development approach; archive Geodatabases of unlimited size

Page 50: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 50

Content replication also needed for:Disaster preparednessState and federal data improvement projectsAggregation by regional geospatial web service providers

WFS, e.g.: efficiency in complete content transfer?Rsync-like function, plus: rights management, inventory processes, metadata management, informed by data update cyclesArchiving delta files vs. complete replication – need to avoid requiring “digital archaeology” in the future

Efficient Content Replication

Page 51: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 51

GML for archiving

GeoDRM -- Adding preservation use cases

Content Packaging -- Industry solution?

Web Services Context DocumentsCan we save data state as well as application state?

Content ReplicationIs this layer in the architecture?

Persistent Identifiers

Points of Engagement with the Open Geospatial Consortium (OGC)

Page 52: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 52

Demonstration archiveOutreach activity – planting seeds

International, national, state, local, commercial

Learning experience, informing:Spatial data infrastructureCommercial vendors (data/software/consulting)Repository software communitiesMetadata practice (both GIS & preservation)Rights management developmentsData and interoperability standards

Project Outcomes

Page 53: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 53

Content Identification and Selection

Work from NC OneMap Data Inventory

Combine with inventory information from various state agencies and from previous NCSU efforts

Develop methodology for selecting from among “early,” “middle,” and “late” stage products

Develop criteria for time series development

Investigate use of emerging Open Geospatial Consortium technologies in data identification

Page 54: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 54

Content Acquisition

Work from NC OneMap Data Sharing Agreements as a starting point (the “blanket”)Secure individual agreements (the “quilt”) Investigate use of OGC technologies in captureExplore use of METS as a metadata wrapper

Ingest FGDC metadata; Xwalk to MODS? PREMIS?Maybe METS DRM short term; GeoDRM long termConsider links to services; version managementGet the geospatial community to tackle the content packaging problem (maybe MPEG 21?)

Page 55: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 55

Partnership Building

Work within context of the NC OneMap initiativeState, local, federal partnership

State expression of the National Map

Defined characteristic: “Historic and temporal data will be maintained and available”Advisory Committee drawn from the NC Geographic Information Coordinating Council subcommittees

Seek external partnersNational States Geographic Information Council FGDC Historical Data Committee

… more

Page 56: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 56

Content Retention and Transfer

Ingest into DspaceExplore how geospatial content interacts with existing digital repository software environments

Investigate re-ingest into a second platformChallenge: keep the collection repository-agnostic

Start to define format migration pathsSpecial problem: geodatabases

Purse long term solutionRoles of data producing agencies, state agencies; NC OneMap; NCSU

Page 57: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 57

Project Status

Completing inventory analysis stage

Storage system and backup deployed

DSpace deployed to production

Metadata workflow finalized

Ingest workflow near finalization

Content migration workflow near finalization

Regional site visits planned for coming months

Wide range of outreach/collaboration: FGDC, ESRI, EDINA (JISC), USGS, OGC, TRB, etc.

Pilot project, georegistering digital archival geologic maps

Page 58: Preservation of Digital Geospatial Data: Challenges and Opportunities Steve Morris Head of Digital Library Initaitives North Carolina State University

Note: Percentages based on the actual number of respondents to each question 58

Questions?

Contact:

Steve MorrisHead, Digital Library InitiativesNCSU Librariesph: (919) [email protected]