View
218
Download
0
Tags:
Embed Size (px)
Citation preview
Archiving State and Local Agency Digital Geospatial Data: An Overview of the Problem Area
Steven P. MorrisHead of Digital Library InitiativesNorth Carolina State University Libraries
GICC Archival and Long Term Access Kickoff Meeting February 29, 2008
Note: Percentages based on the actual number of respondents to each question 2
Outline
Risks to Digital Geospatial DataValue in Temporal/Historical DataArchiving ChallengesContent Identification and Selection IssuesIndustry EngagementArchives ProcessesConclusion
Note: Percentages based on the actual number of respondents to each question 3
NC Geospatial Data Archiving Project
Partnership between university library (NCSU) and state agency (NCCGIA), with Library of Congress under the National Digital Information Infrastructure and Preservation Program (NDIIPP)One of 8 initial NDIIPP collection building partnershipsFocus on state and local geospatial content in North Carolina (state demonstration)Tied to NC OneMap initiative, which provides for seamless access to data, metadata, and inventoriesObjective: engage existing state/federal geospatial data infrastructures in preservation
Serve as catalyst for discussion within industry
Note: Percentages based on the actual number of respondents to each question 4
NCGDAP Goals
Repository GoalCapture at-risk dataExplore technical and organizational challenges
Project End GoalData Producers: Improved temporal data management practicesArchives: More efficient means of acquiring and preserving data;
Progress towards best practices
Temporal data management vs. long-term preservation
Note: Percentages based on the actual number of respondents to each question 5
Risks to Geospatial Data
6
How would you describe your current geospatial archive?
Last week’s set of nightly tape backups
Several boxes of CD’s and DVD’s
Bob’s hard drive
A collection of files in our “GIS Folder”
A stand-alone spatial database
The data back-end for our internet mapping application
An enterprise GIS
7
Digital Preservation Points of Failure
Data is not saved, or …can’t be found, or …media is obsolete, or …media is corrupt, or …format is obsolete, or …file is corrupt, or …meaning is lost
Solutions:
MigrationEmulationEncapsulation XML
8
Risks to Geospatial Data
Producer focus on current dataData overwrite as common practice
Future support of data formats in questionNo open, supported format for vector data
Shift to web services-based accessData becoming more ephemeral
Inadequate or nonexistent metadataImpedes discovery and use
Increasing use of spatial databases for data management
The whole is greater than the sum of the parts
Note: Percentages based on the actual number of respondents to each question 9
Value in Older Geospatial Data
10
Value in Older Data: Cultural Heritage
Future uses of data are difficult to anticipate (as with Sanborn Maps)
11
Value in Older Data: Solving Business Problems
Suburban Development 1993/2002Near Mecklenburg-Cabarrus County border
Land use change analysis
Real estate trends analysis
Site location analysis
Disaster response
Resolution of legal challenges Impervious surface maps
12
Problem: Flood and Hurricane Preparedness
13
Application: Impervious Surface Change Mapping
A. B.
C. D.2002 Impervious 2004 Aerial Photography
2004 Impervious using 2002 Mask 2004 Impervious Update
14
Problem: Beach Erosion and Shoreline Change
16
Problem: Tracking Land Use Change
17
Application: Land Use Change Mapping
Input Data Output GIS Data
Using Mecklenburg County 2002 true color orthorectified aerial photography
Developing Areas
Note: Percentages based on the actual number of respondents to each question 18
Preservation Challenges
19
Challenge: Vector Data Formats
No widely-supported, open vector formats for geospatial data
Spatial Data Transfer Standard (SDTS) not widely supportedGeography Markup Language (GML) – diversity of application schemas and profiles a challenge for “permanent access”
Spatial DatabasesThe whole is more than the sum of the parts, and the whole is very difficult to preserveCan export individual data layers for curation, but relationships and context are lostSome thinking of using the spatial database as the primary archival platform
20
Challenge: Cartographic Representation
Counterpart to the map is not just the dataset but also models, symbolization, classification, annotation, etc.
22
Challenge: Preservation Metadata
Metadata Archived?
0.0%10.0%20.0%30.0%40.0%50.0%60.0%70.0%
FGDC format Locally definedmetadata
NC OneMapmetadata starter
block
None
% o
f R
esp
on
den
ts
Results from a 2006 survey of all 100 NC counties and 25 largest NC municipalities
23
Challenge: Data Capture
Response:yes = 65.3%, no = 34.7%*
(out of 57.6% response rate)
Jurisdictions Archiving Snapshots
No: 34.7%
Yes: 65.3%
No response
Yes
No
2006 Frequency of Capture Survey targeting North Carolina counties and municipalities
Note: Percentages based on the actual number of respondents to each question 24
Challenge: Digital Object Complexity
XML DatabaseExport
XML DatabaseExport
TIFF Images •Pixel Value and Header file•World file•Coordinate System file•Metadata file
Shapefiles•Geometry file•Index file•Attribute file•Metadata file•Coordinate System file•Spatial Index files
Potential Ingest Objects
Note: Percentages based on the actual number of respondents to each question 25
Where is the Dataset?
Note: Percentages based on the actual number of respondents to each question 26
Here’s One!
Files
• Multi-file dataset• Georeferencing• Metadata file• Symbolization file• Additional documentation• License• Disclaimer• More
Metadata
• FGDC• Acquisition metadata• Transfer metadata • Ingest metadata• Archive rights• Archive processes• Collection metadata• Series metadata
Note: Percentages based on the actual number of respondents to each question 27
Other Challenges
Rights managementData versioningSemantic issuesLarge scale content transferIntegrating older analog dataMore …
Note: Percentages based on the actual number of respondents to each question 28
Different Ways to Approach Preservation
Technical solutions: How do we preserve acquired content over the long term?
Cultural/Organizational solutions: How do we make the data more preservable—and more prone to be preserved—from point of production?
Current use and data sharing requirements – not archiving needs – are most likely to drive improved preservability of content and improvement of metadata
Note: Percentages based on the actual number of respondents to each question 29
Content Identification and Selection Issues
Note: Percentages based on the actual number of respondents to each question 30
What do Inventories (e.g. RAMONA) Offer to Archives?
Data Availability InformationDetailed information by data layer
Contact InformationMinimal Metadata
Descriptive, technical, administrative
Rights InformationDocument Technical Environment
Software used, formats, transfer methods
Future Data Development Plans
Note: Percentages based on the actual number of respondents to each question 31
Selection Issues
Most content is already at some level of riskEarly-Middle-Late Stage issues
Middle stage is usually the “sweet spot”, e.g. TIFF orthophotos vs. raw images or compressed images
Also added-value products: digital maps, cartographic representation
Digital maps: “record” or not?
Frequency of capture
Note: Percentages based on the actual number of respondents to each question 32
Problem:Multiple choice for: format type, coordinate system, tiling scheme
33
Geospatial Data Types – Spatial Databases
• Vector and raster data
• Relationships• Behaviors• Annotation• Data Models
Note: Percentages based on the actual number of respondents to each question 34
GIS Software
Software project file (.mxd, .apr, …)
Data layer file (.avl, .lyr, …)
PDF map exports
Web Services-based representations
Geospatial Data Types – Cartographic
Note: Percentages based on the actual number of respondents to each question 35
Mobile, LBS, and, social networking applications
Long-term cultural heritage value in non-overhead imagery: more descriptive of place and function
Oblique Imagery
Road Videologs
Tax Dept. Photos
Street View Images
Other Geospatial Data Types – Place-based Data
Note: Percentages based on the actual number of respondents to each question 36
Time series – vector dataParcel Boundary Changes 2001-2004, North Raleigh, NC
Continuously updated data: Frequency of snapshots?Different for various framework
layers?
Note: Percentages based on the actual number of respondents to each question 37
Sept. 2006 Frequency of Capture Survey
Survey objective:Document current practices for obtaining archival snapshots of county/municipal geospatial vector data layersSeek guidance about frequency of capture
Survey topics:General questions about data archiving practiceSpecific questions about parcels, street centerlines, jurisdictional boundaries, and zoning
Survey subjects:All 100 counties and 25 municipalities58% response rateSurvey conducted September 2006
Note: Percentages based on the actual number of respondents to each question 38
Data Capture Survey Results: Overview
Two-thirds of responding agencies create and retain periodic snapshotsLong-term retention more common in counties with larger populationsStorage environments vary, with servers and CD-ROMs most commonOffsite storage (or both onsite and offsite) is used by nearly half of the respondentsPopularity of historic images has resulted in scanning and geo-referencing of hardcopy aerial photos among one-third of the respondents
Note: Percentages based on the actual number of respondents to each question 39
Survey Observations
Process of survey formulation and implementation helped to socialize the problem of archiving dataLocal innovation needs to be mined further to inform development of best practicesBusiness drivers for archiving need more study (e.g., stated adherence to retention policy)Exposure to peer practice encourages archivingPronounced local interest in scanning/rectifying older analog maps and imagery
Note: Percentages based on the actual number of respondents to each question 41
Framework data communitiesSnapshot frequency, naming schemes, classification, GML application schemas, format strategies
Metadata standards and outreachPersistent identifiers, versioning, feedback on metadata quality
Content exchange networks/content replicationFor data improvement projects, disaster preparedness, aggregation by regional service providers, … and archives
Where does archiving and preservation fit in?
Points of Engagement with Spatial Data Infrastructure (e.g. NC OneMap)
Note: Percentages based on the actual number of respondents to each question 42
Content Exchange Infrastructure
High volume of state/federal requests for local dataSolving the present-day problems of data sharing is a pre-requisite to solving the problem of long-term accessLeveraging more compelling business reasons to put the data in motion (disaster preparedness, business continuity, highway construction, census, …)Content exchange networks:
Minimize need to make contactAdd technical, administrative, descriptive metadataEstablish rights and provenance
Note: Percentages based on the actual number of respondents to each question 44
Retention schedulesGeospatial dataAdministrative records
Record accessioningAppraisal systemSystem documentationArchival data and metadata standardsRules for disposition of local government records
Maine GeoArchives Project Components
Note: Percentages based on the actual number of respondents to each question 45
ComplianceResponsibleCredibilityCompletenessAuthenticitySoundness
Maine GeoArchives: Functional Requirements
AuditabilityAvailabilityExportableRenderableRedactable
Adopted set of functional requirements for recordkeeping systems to insure permanent retention of data layers
Note: Percentages based on the actual number of respondents to each question 47
What are the points of intersection between archive needs and business continuity/disaster preparedness and other business needs?How to best stimulate and learn from innovation at the state/regional/local level?How to make data more preservable from point of production and on through data transferHow to most effectively move data in an efficient, well-documented manner with clarified rightsHow to best make State Archives a part of spatial data infrastructure?Defining the record: data vs. derivative components
Key issues
Note: Percentages based on the actual number of respondents to each question 48
Cultural: Changing Industry Thinking
Is the geospatial industry “temporally-impaired?”Lack of access to older dataLack for tool/model support for temporal analysisMetadata: poor support for changing dataEducation: building class projects around available data (i.e., not temporal)
Increased interest now in temporal applications?Increased demand for temporal data?Improved tool support: ArcGIS 9.2 animation tools; Geodatabase History, etc.
Note: Percentages based on the actual number of respondents to each question 49
Questions?
Contact:
Steve MorrisHead, Digital Library InitiativesNCSU Librariesph: (919) [email protected]
http://www.lib.ncsu.edu/ncgdap