49
Archiving State and Local Agency Digital Geospatial Data: An Overview of the Problem Area Steven P. Morris Head of Digital Library Initiatives North Carolina State University Libraries GICC Archival and Long Term Access Kickoff Meeting February 29, 2008

Archiving State and Local Agency Digital Geospatial Data: An Overview of the Problem Area Steven P. Morris Head of Digital Library Initiatives North Carolina

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

Archiving State and Local Agency Digital Geospatial Data: An Overview of the Problem Area

Steven P. MorrisHead of Digital Library InitiativesNorth Carolina State University Libraries

GICC Archival and Long Term Access Kickoff Meeting February 29, 2008

Note: Percentages based on the actual number of respondents to each question 2

Outline

Risks to Digital Geospatial DataValue in Temporal/Historical DataArchiving ChallengesContent Identification and Selection IssuesIndustry EngagementArchives ProcessesConclusion

Note: Percentages based on the actual number of respondents to each question 3

NC Geospatial Data Archiving Project

Partnership between university library (NCSU) and state agency (NCCGIA), with Library of Congress under the National Digital Information Infrastructure and Preservation Program (NDIIPP)One of 8 initial NDIIPP collection building partnershipsFocus on state and local geospatial content in North Carolina (state demonstration)Tied to NC OneMap initiative, which provides for seamless access to data, metadata, and inventoriesObjective: engage existing state/federal geospatial data infrastructures in preservation

Serve as catalyst for discussion within industry

Note: Percentages based on the actual number of respondents to each question 4

NCGDAP Goals

Repository GoalCapture at-risk dataExplore technical and organizational challenges

Project End GoalData Producers: Improved temporal data management practicesArchives: More efficient means of acquiring and preserving data;

Progress towards best practices

Temporal data management vs. long-term preservation

Note: Percentages based on the actual number of respondents to each question 5

Risks to Geospatial Data

6

How would you describe your current geospatial archive?

Last week’s set of nightly tape backups

Several boxes of CD’s and DVD’s

Bob’s hard drive

A collection of files in our “GIS Folder”

A stand-alone spatial database

The data back-end for our internet mapping application

An enterprise GIS

7

Digital Preservation Points of Failure

Data is not saved, or …can’t be found, or …media is obsolete, or …media is corrupt, or …format is obsolete, or …file is corrupt, or …meaning is lost

Solutions:

MigrationEmulationEncapsulation XML

8

Risks to Geospatial Data

Producer focus on current dataData overwrite as common practice

Future support of data formats in questionNo open, supported format for vector data

Shift to web services-based accessData becoming more ephemeral

Inadequate or nonexistent metadataImpedes discovery and use

Increasing use of spatial databases for data management

The whole is greater than the sum of the parts

Note: Percentages based on the actual number of respondents to each question 9

Value in Older Geospatial Data

10

Value in Older Data: Cultural Heritage

Future uses of data are difficult to anticipate (as with Sanborn Maps)

11

Value in Older Data: Solving Business Problems

Suburban Development 1993/2002Near Mecklenburg-Cabarrus County border

Land use change analysis

Real estate trends analysis

Site location analysis

Disaster response

Resolution of legal challenges Impervious surface maps

13

Application: Impervious Surface Change Mapping

A. B.

C. D.2002 Impervious 2004 Aerial Photography

2004 Impervious using 2002 Mask 2004 Impervious Update

15

Application: Shoreline Change Mapping

17

Application: Land Use Change Mapping

Input Data Output GIS Data

Using Mecklenburg County 2002 true color orthorectified aerial photography

Developing Areas

Note: Percentages based on the actual number of respondents to each question 18

Preservation Challenges

19

Challenge: Vector Data Formats

No widely-supported, open vector formats for geospatial data

Spatial Data Transfer Standard (SDTS) not widely supportedGeography Markup Language (GML) – diversity of application schemas and profiles a challenge for “permanent access”

Spatial DatabasesThe whole is more than the sum of the parts, and the whole is very difficult to preserveCan export individual data layers for curation, but relationships and context are lostSome thinking of using the spatial database as the primary archival platform

20

Challenge: Cartographic Representation

Counterpart to the map is not just the dataset but also models, symbolization, classification, annotation, etc.

21

Challenge: Geospatial Web Services

• How to capture records from decision- making processes?

22

Challenge: Preservation Metadata

Metadata Archived?

0.0%10.0%20.0%30.0%40.0%50.0%60.0%70.0%

FGDC format Locally definedmetadata

NC OneMapmetadata starter

block

None

% o

f R

esp

on

den

ts

Results from a 2006 survey of all 100 NC counties and 25 largest NC municipalities

23

Challenge: Data Capture

Response:yes = 65.3%, no = 34.7%*

(out of 57.6% response rate)

Jurisdictions Archiving Snapshots

No: 34.7%

Yes: 65.3%

No response

Yes

No

2006 Frequency of Capture Survey targeting North Carolina counties and municipalities

Note: Percentages based on the actual number of respondents to each question 24

Challenge: Digital Object Complexity

XML DatabaseExport

XML DatabaseExport

TIFF Images •Pixel Value and Header file•World file•Coordinate System file•Metadata file

Shapefiles•Geometry file•Index file•Attribute file•Metadata file•Coordinate System file•Spatial Index files

Potential Ingest Objects

Note: Percentages based on the actual number of respondents to each question 25

Where is the Dataset?

Note: Percentages based on the actual number of respondents to each question 26

Here’s One!

Files

• Multi-file dataset• Georeferencing• Metadata file• Symbolization file• Additional documentation• License• Disclaimer• More

Metadata

• FGDC• Acquisition metadata• Transfer metadata • Ingest metadata• Archive rights• Archive processes• Collection metadata• Series metadata

Note: Percentages based on the actual number of respondents to each question 27

Other Challenges

Rights managementData versioningSemantic issuesLarge scale content transferIntegrating older analog dataMore …

Note: Percentages based on the actual number of respondents to each question 28

Different Ways to Approach Preservation

Technical solutions: How do we preserve acquired content over the long term?

Cultural/Organizational solutions: How do we make the data more preservable—and more prone to be preserved—from point of production?

Current use and data sharing requirements – not archiving needs – are most likely to drive improved preservability of content and improvement of metadata

Note: Percentages based on the actual number of respondents to each question 29

Content Identification and Selection Issues

Note: Percentages based on the actual number of respondents to each question 30

What do Inventories (e.g. RAMONA) Offer to Archives?

Data Availability InformationDetailed information by data layer

Contact InformationMinimal Metadata

Descriptive, technical, administrative

Rights InformationDocument Technical Environment

Software used, formats, transfer methods

Future Data Development Plans

Note: Percentages based on the actual number of respondents to each question 31

Selection Issues

Most content is already at some level of riskEarly-Middle-Late Stage issues

Middle stage is usually the “sweet spot”, e.g. TIFF orthophotos vs. raw images or compressed images

Also added-value products: digital maps, cartographic representation

Digital maps: “record” or not?

Frequency of capture

Note: Percentages based on the actual number of respondents to each question 32

Problem:Multiple choice for: format type, coordinate system, tiling scheme

33

Geospatial Data Types – Spatial Databases

• Vector and raster data

• Relationships• Behaviors• Annotation• Data Models

Note: Percentages based on the actual number of respondents to each question 34

GIS Software

Software project file (.mxd, .apr, …)

Data layer file (.avl, .lyr, …)

PDF map exports

Web Services-based representations

Geospatial Data Types – Cartographic

Note: Percentages based on the actual number of respondents to each question 35

Mobile, LBS, and, social networking applications

Long-term cultural heritage value in non-overhead imagery: more descriptive of place and function

Oblique Imagery

Road Videologs

Tax Dept. Photos

Street View Images

Other Geospatial Data Types – Place-based Data

Note: Percentages based on the actual number of respondents to each question 36

Time series – vector dataParcel Boundary Changes 2001-2004, North Raleigh, NC

Continuously updated data: Frequency of snapshots?Different for various framework

layers?

Note: Percentages based on the actual number of respondents to each question 37

Sept. 2006 Frequency of Capture Survey

Survey objective:Document current practices for obtaining archival snapshots of county/municipal geospatial vector data layersSeek guidance about frequency of capture

Survey topics:General questions about data archiving practiceSpecific questions about parcels, street centerlines, jurisdictional boundaries, and zoning

Survey subjects:All 100 counties and 25 municipalities58% response rateSurvey conducted September 2006

Note: Percentages based on the actual number of respondents to each question 38

Data Capture Survey Results: Overview

Two-thirds of responding agencies create and retain periodic snapshotsLong-term retention more common in counties with larger populationsStorage environments vary, with servers and CD-ROMs most commonOffsite storage (or both onsite and offsite) is used by nearly half of the respondentsPopularity of historic images has resulted in scanning and geo-referencing of hardcopy aerial photos among one-third of the respondents

Note: Percentages based on the actual number of respondents to each question 39

Survey Observations

Process of survey formulation and implementation helped to socialize the problem of archiving dataLocal innovation needs to be mined further to inform development of best practicesBusiness drivers for archiving need more study (e.g., stated adherence to retention policy)Exposure to peer practice encourages archivingPronounced local interest in scanning/rectifying older analog maps and imagery

Note: Percentages based on the actual number of respondents to each question 40

Engaging Industry

Note: Percentages based on the actual number of respondents to each question 41

Framework data communitiesSnapshot frequency, naming schemes, classification, GML application schemas, format strategies

Metadata standards and outreachPersistent identifiers, versioning, feedback on metadata quality

Content exchange networks/content replicationFor data improvement projects, disaster preparedness, aggregation by regional service providers, … and archives

Where does archiving and preservation fit in?

Points of Engagement with Spatial Data Infrastructure (e.g. NC OneMap)

Note: Percentages based on the actual number of respondents to each question 42

Content Exchange Infrastructure

High volume of state/federal requests for local dataSolving the present-day problems of data sharing is a pre-requisite to solving the problem of long-term accessLeveraging more compelling business reasons to put the data in motion (disaster preparedness, business continuity, highway construction, census, …)Content exchange networks:

Minimize need to make contactAdd technical, administrative, descriptive metadataEstablish rights and provenance

Note: Percentages based on the actual number of respondents to each question 43

Archives Processes

Note: Percentages based on the actual number of respondents to each question 44

Retention schedulesGeospatial dataAdministrative records

Record accessioningAppraisal systemSystem documentationArchival data and metadata standardsRules for disposition of local government records

Maine GeoArchives Project Components

Note: Percentages based on the actual number of respondents to each question 45

ComplianceResponsibleCredibilityCompletenessAuthenticitySoundness

Maine GeoArchives: Functional Requirements

AuditabilityAvailabilityExportableRenderableRedactable

Adopted set of functional requirements for recordkeeping systems to insure permanent retention of data layers

Note: Percentages based on the actual number of respondents to each question 46

Conclusion

Note: Percentages based on the actual number of respondents to each question 47

What are the points of intersection between archive needs and business continuity/disaster preparedness and other business needs?How to best stimulate and learn from innovation at the state/regional/local level?How to make data more preservable from point of production and on through data transferHow to most effectively move data in an efficient, well-documented manner with clarified rightsHow to best make State Archives a part of spatial data infrastructure?Defining the record: data vs. derivative components

Key issues

Note: Percentages based on the actual number of respondents to each question 48

Cultural: Changing Industry Thinking

Is the geospatial industry “temporally-impaired?”Lack of access to older dataLack for tool/model support for temporal analysisMetadata: poor support for changing dataEducation: building class projects around available data (i.e., not temporal)

Increased interest now in temporal applications?Increased demand for temporal data?Improved tool support: ArcGIS 9.2 animation tools; Geodatabase History, etc.

Note: Percentages based on the actual number of respondents to each question 49

Questions?

Contact:

Steve MorrisHead, Digital Library InitiativesNCSU Librariesph: (919) [email protected]

http://www.lib.ncsu.edu/ncgdap