32
State and Local Agency Digital Geospatial Data Preservation The North Carolina Experience Steve Morris Steve Morris NCSU Libraries NCSU Libraries Earth Sciences Information Earth Sciences Information Partners (ESIP) Workshop Partners (ESIP) Workshop July 8, 2009 July 8, 2009

State and Local Agency Digital Geospatial Data Preservation The North Carolina Experience Steve Morris NCSU Libraries Earth Sciences Information Partners

Embed Size (px)

Citation preview

Page 1: State and Local Agency Digital Geospatial Data Preservation The North Carolina Experience Steve Morris NCSU Libraries Earth Sciences Information Partners

State and Local Agency Digital Geospatial Data PreservationThe North Carolina Experience

Steve MorrisSteve MorrisNCSU LibrariesNCSU Libraries

Earth Sciences Information Earth Sciences Information Partners (ESIP) WorkshopPartners (ESIP) WorkshopJuly 8, 2009July 8, 2009

Page 2: State and Local Agency Digital Geospatial Data Preservation The North Carolina Experience Steve Morris NCSU Libraries Earth Sciences Information Partners

One of eight initial collection building projects in the Library of Congress NDIIPP (National Digital Information Infrastructure and Preservation Program)

Lead organizations: North Carolina State University Libraries and North Carolina Center for Geographic Information & Analysis (NCCGIA)

Focus: State and local government geospatial data in NC Repository development as catalyst for discussion Goal: Engage spatial data infrastructure in data archiving

Initial 3 year project extended to Dec. 2009

NC Geospatial Data Archiving Project (NCGDAP)

Page 3: State and Local Agency Digital Geospatial Data Preservation The North Carolina Experience Steve Morris NCSU Libraries Earth Sciences Information Partners

NCGDAP Data Types – Raster

• Digital orthophotography• Satellite imagery

Static data

Page 4: State and Local Agency Digital Geospatial Data Preservation The North Carolina Experience Steve Morris NCSU Libraries Earth Sciences Information Partners

NCGDAP Data Types – Vector Data

• Point, line, and polygon• Attached attribute data

Often updated

Page 5: State and Local Agency Digital Geospatial Data Preservation The North Carolina Experience Steve Morris NCSU Libraries Earth Sciences Information Partners

Note: Percentages based on the actual number of respondents to each question

Downtown Raleigh Near State Capitol

2005 Wake County Ortho

Imagery = DurableStatic Simple structureMostly open formats

Vector data = VolatileFrequent updateComplex structureMostly proprietary formats

Downtown Raleigh, NC Near State Capitol

2005 Wake County Ortho

Imagery = DurableStatic Simple structureMostly open formats

Vector data = VolatileFrequent updateComplex structureMostly commercial formats

Page 6: State and Local Agency Digital Geospatial Data Preservation The North Carolina Experience Steve Morris NCSU Libraries Earth Sciences Information Partners

NCGDAP Data Types – Spatial Databases

• Vector and raster data

• Relationships• Behaviors• Annotation• Data Models

Page 7: State and Local Agency Digital Geospatial Data Preservation The North Carolina Experience Steve Morris NCSU Libraries Earth Sciences Information Partners

Dynamic content Constantly updated information Data versioning

Digital object complexity Spatially-enabled databases Complicated, multi-component formats Proprietary formats

Geospatial Data: Compelling Issues

Page 8: State and Local Agency Digital Geospatial Data Preservation The North Carolina Experience Steve Morris NCSU Libraries Earth Sciences Information Partners

Data consists of multi-file, multi-format objects

Ancillary data files can be shared by datasets

Some format conversions involve one-to-many relationships

Compressed archive files are common and behave unpredictably

And all the usual challenges: format validation, validity checking, threat scanning,…

Ingest Challenges: General

Page 9: State and Local Agency Digital Geospatial Data Preservation The North Carolina Experience Steve Morris NCSU Libraries Earth Sciences Information Partners

Where is the Dataset?

Page 10: State and Local Agency Digital Geospatial Data Preservation The North Carolina Experience Steve Morris NCSU Libraries Earth Sciences Information Partners

Here’s One!

Files

• Multi-file dataset• Georeferencing• Metadata file• Symbolization file• Additional documentation• License• Disclaimer• More

Metadata

• FGDC• Acquisition metadata• Transfer metadata • Ingest metadata• Archive rights• Archive processes• Collection metadata• Series metadata

Page 11: State and Local Agency Digital Geospatial Data Preservation The North Carolina Experience Steve Morris NCSU Libraries Earth Sciences Information Partners

Metadata is encoded in a variety or ways The FGDC content standard for metadata lacked an

encoding standard (arrived pre-XML), addressed in ISO 19115/19139 North American Profile implementation

XML (varied schemas), TXT, HTML Metadata is missing

Only about 25% of local agencies use FGDC Metadata is wrong

Metadata is commonly asynchronous with the data Inconsistent use of dataset naming, etc.

e.g., “Streets” vs. “Wake County Streets”

Ingest Challenges: Metadata

Page 12: State and Local Agency Digital Geospatial Data Preservation The North Carolina Experience Steve Morris NCSU Libraries Earth Sciences Information Partners

Existing geospatial metadata often needs: Remediation – to fix errors or omissions Normalization – to adhere to a standard structure Synchronization – so that the data at hand matches the metadata

If no metadata then: Can build minimal metadata using templates and auto-extraction Lose key information such as data quality, lineage, data

dictionaries

Automating metadata for repository ingest Raster data is easy – large sets of consistently structured files Vector data is hard – each dataset is a different story

Many additional administrative and technical metadata elements not accommodated by FGDC

NCGDAP Metadata Summary

Page 13: State and Local Agency Digital Geospatial Data Preservation The North Carolina Experience Steve Morris NCSU Libraries Earth Sciences Information Partners

Extended Curation: Feedback and Outreach

Data Receipt

Format Processing

Metadata Processing

Ingest Processes

Content Producers

Industry

Standards Organizations

Page 14: State and Local Agency Digital Geospatial Data Preservation The North Carolina Experience Steve Morris NCSU Libraries Earth Sciences Information Partners

Metadata standards and outreach Metadata quality, best practices

Inventories Reduce “contact fatigue”, shareable information store

Content exchange networks Leverage more compelling business reasons to put data in

motion Automate process, add technical & administrative metadata

Framework data communities Snapshot frequency, schemas, format strategies

Spatial Data Infrastructure and Archiving

Page 15: State and Local Agency Digital Geospatial Data Preservation The North Carolina Experience Steve Morris NCSU Libraries Earth Sciences Information Partners
Page 16: State and Local Agency Digital Geospatial Data Preservation The North Carolina Experience Steve Morris NCSU Libraries Earth Sciences Information Partners

Geospatial datasets are typically complex, multi-file objects

Data are often accompanied by ancillary data, which must be associated with the data item

Rights information and licenses must be associated with the item

Various implementations in different domains (METS, IMS-CP, XFDU, etc.)

Simpler .zip-based packages also used (MEF, KMZ, etc.)

Content Packaging Issues

Page 17: State and Local Agency Digital Geospatial Data Preservation The North Carolina Experience Steve Morris NCSU Libraries Earth Sciences Information Partners

Spatial Database Approaches

Manage database forward over time

Extract data layers to preservable form

Set aside archival snapshot of database

Page 18: State and Local Agency Digital Geospatial Data Preservation The North Carolina Experience Steve Morris NCSU Libraries Earth Sciences Information Partners

Partners (NC, KY, UT, Library of Congress, NCSU): State geospatial organizations State Archives

State-to-state and geo-to-Archives collaboration Organizational and technical diversity across states

Archives as part of spatial data infrastructure Selection and appraisal processes Retention schedule development Data transfer to archives Development of enhanced business cases

GeoMAPP: Geospatial Multistate Archival and Preservation Partnership

Page 19: State and Local Agency Digital Geospatial Data Preservation The North Carolina Experience Steve Morris NCSU Libraries Earth Sciences Information Partners

NCGDAP Learning Outcomes

Preservation of GIS projects is needed to support re-creation of past work

Preservation of data representations is needed to document decision-making processes

Validation, remediation, and conversion of data and metadata is expensive: push for improvements upstream

Some repositories handle “items”: can result in “atomization” of data

For vendors, frame data preservation as a “customer problem” -- must build the business case

Page 20: State and Local Agency Digital Geospatial Data Preservation The North Carolina Experience Steve Morris NCSU Libraries Earth Sciences Information Partners

Thank You!

Steve MorrisHead, Digital Library InitiativesNorth Carolina State University [email protected]

North Carolina Geospatial Data Archiving Projecthttp://www.lib.ncsu.edu/ncgdap

GeoMAPPhttp://www.geomapp.net

Page 21: State and Local Agency Digital Geospatial Data Preservation The North Carolina Experience Steve Morris NCSU Libraries Earth Sciences Information Partners

AGRC exports data from SGID and splits out datasets by series. Metadata occasionally incomplete complete

Local governments supply GIS datasets on CD/DVD to AGRC. Metadata often missing

• All Metadata is completed to FGDC Standards • AGRC creates geoPDF files of individual datasets, plus

ZIP files of the native format. • One ZIP file would contain all the pieces belonging to

one shapefile or, alternatively, the file would contain a geodatabase.

• Geodatabases would not be just one big database with everything in it (multiple series and years).

• Instead, the native files would be composed of a single downloadable file per series per year.

AGRC copies these files to Archives’ FTP server.

Example FTP Site Structure:ftp.archives-agrc.utah.gov/Archives Metadata harvested to populate Archive’s Finding Aids

oBiota Dublin Core MetadataoBoundaries Dublin Core Metadata

MunicipalityRecords-Series-26846 Dublin Core Metadata2000

oMunicipalBoundaries.zip FGDC MetadataoMunicipalBoundaries.pdf FGDC Metadata

200120022003

CountyBoundaries-Series-26845 Dublin Core Metadata20032004

Draft of Utah’s GIS to Archives Data Flow

Page 22: State and Local Agency Digital Geospatial Data Preservation The North Carolina Experience Steve Morris NCSU Libraries Earth Sciences Information Partners

Database with Dublin CoreDescriptive and

Administrative Metadata

iRODS

DSpace

ContentFiles

DistributedStorage Layer

Single item & batch ingest into DSpace by

Archivist

Kentucky Metadata Workflow into DSpace and iRODS Environment

UN

C

oth

er

KD

LA

Batch metadata extraction

using iRODS rules

Database with Administrative & Preservation Metadata

Preservation metadata from iRODS rules

Metadata & contententered by agencies using template and

modified by Archivist

Page 23: State and Local Agency Digital Geospatial Data Preservation The North Carolina Experience Steve Morris NCSU Libraries Earth Sciences Information Partners

Source Metadata Translation

Hub-and-spoke model a la Echo DEPositoryrepository agnosticmodular conversion

hubfacilitate repository

software migration & inter-archive exchange

Page 24: State and Local Agency Digital Geospatial Data Preservation The North Carolina Experience Steve Morris NCSU Libraries Earth Sciences Information Partners

Lead organizations: North Carolina Center for Geographic Information & Analysis (NCCGIA), State Archives of NC, with Library of Congress

Partners: State geospatial organizations of Kentucky and Utah State Archives of Kentucky and Utah NCSU Libraries in catalytic/advisory role

State-to-state and geo-to-Archives collaboration 2 year project: Nov. 2007-Dec. 2009 Archives as part of Spatial Data Infrastructure

GeoMAPP: Geospatial Multistate Archival and Preservation Partnership

Page 25: State and Local Agency Digital Geospatial Data Preservation The North Carolina Experience Steve Morris NCSU Libraries Earth Sciences Information Partners

Introduce GIS organizations and State Archives to each other

Archival selection and appraisal processes Retention schedule development Data transfer to archives Development of enhanced business case

GeoMAPP: Project Components

Page 26: State and Local Agency Digital Geospatial Data Preservation The North Carolina Experience Steve Morris NCSU Libraries Earth Sciences Information Partners

Repository Goal Capture at-risk data Explore technical and organizational challenges

Project End Goal Data Producers: Improved temporal data

management practices Archives: More efficient means of acquiring and

preserving data; Progress towards best practices

NC Geospatial Data Archiving Project (NCGDAP)

Temporal data management vs. long-term preservation

Page 27: State and Local Agency Digital Geospatial Data Preservation The North Carolina Experience Steve Morris NCSU Libraries Earth Sciences Information Partners

Data capture Backups are common, but not long-term

archives Producer focus on current data Shift to web services-based access

Inadequate or non-existent metadata Consistent NC survey statistics: Only 40% of

data producers create and maintain metadata Existing metadata often needs to be normalized,

synchronized with the data, and remediated

Geospatial Data Preservation Challenges

Loss of memory about the data is also a problem

Page 28: State and Local Agency Digital Geospatial Data Preservation The North Carolina Experience Steve Morris NCSU Libraries Earth Sciences Information Partners

When to automate and when not to Learn first from human intervention Minimizing risk of error related to human intervention

Accepting that ingest packages used will evolve over time (implications for archive?)

Handling post-ingest migrations

Ongoing Challenges

Page 29: State and Local Agency Digital Geospatial Data Preservation The North Carolina Experience Steve Morris NCSU Libraries Earth Sciences Information Partners

Challenge: Preservation Metadata

Metadata Archived?

0.0%10.0%20.0%30.0%40.0%50.0%60.0%70.0%

FGDC format Locally definedmetadata

NC OneMapmetadata starter

block

None

% o

f R

esp

on

den

ts

Results from a 2006 survey of all 100 NC counties and 25 largest NC municipalities

Page 30: State and Local Agency Digital Geospatial Data Preservation The North Carolina Experience Steve Morris NCSU Libraries Earth Sciences Information Partners

Capture “transfer set” metadata Normalize, synchronize, and remediate existing

metadata, and retain original metadata record Treat contact information as archival Update metadata with format conversions Use ESRI Profile of FGDC

added technical and administrative elementsHas an XML schemaArcCatalog tool support

Use simple rights encoding scheme Record metadata in a workflow management

database

Some Key Metadata Decisions

Page 31: State and Local Agency Digital Geospatial Data Preservation The North Carolina Experience Steve Morris NCSU Libraries Earth Sciences Information Partners

NCSU Libraries 27 March 2006Digital Preservation in State Government - Wilmington

SIP Item Creation: Workflow

• Submission Information Package grouping– Ontology logic based on defined multi-file

complex format components and directory structure

• Repository-agnostic item grouping

Page 32: State and Local Agency Digital Geospatial Data Preservation The North Carolina Experience Steve Morris NCSU Libraries Earth Sciences Information Partners

Federal Geographic Data Committee (FGDC) Content Standard for Digital Geospatial Metadata Version one (1994) mandated for use by federal agencies Descriptive metadata, plus some administrative and

technical Extensive use at state level, spotty use at local level Problem: content standard without an encoding spec FGDC profiles: ESRI, NBII, Remote Sensing, etc.

ISO Standards ISO 19115: Geospatial Information – Metadata (2003) ISO 19139: Geospatial Information – Metadata – XML

(2007) North American Profile of ISO to replace FGDC CGDSM

Metadata Overview