Upload
myrtle-newton
View
225
Download
0
Tags:
Embed Size (px)
Citation preview
North Carolina Geospatial Data Archiving Project/NDIIPP:
Collection and preservation of at-risk digital geospatial data
Partners:
NCSU LibrariesProject Lead: Steve Morris
NC Center for Geographic Information & AnalysisProject Lead: Zsolt Nagy
NSDI Partnership Community Meeting March 1, 2006
Note: Percentages based on the actual number of respondents to each question 2
Outline
Risks to Digital Geospatial DataOverview of NC Geospatial Data Archiving Project and NDIIPPPreservation Challenges and Possible SolutionsPoints of Engagement with Spatial Data Infrastructure and Industry
Note: Percentages based on the actual number of respondents to each question 3
Risks to Digital Geospatial Data
.shp
.mif
.gml
.e00
.dwg
.dgn
.bsb
.bil
.sid
Note: Percentages based on the actual number of respondents to each question 4
Risks to Digital Geospatial Data
Producer focus on current dataArchiving data does not guarantee “permanent access”
Future support of data formats in questionNeed to migrate formats or allow for emulation
Data failure“Bit rot”, media failure
Preservation metadata requirementsDescriptive, administrative, technical, DRM
Shift to “streaming data” for access
Note: Percentages based on the actual number of respondents to each question 5
Time series – vector dataParcel Boundary Changes 2001-2004, North Raleigh, NC
Temporal data to support business needs in: Real estate analysis
Land use change analysisEconomic planning
Note: Percentages based on the actual number of respondents to each question 6
Time series – Ortho imageryVicinity of Raleigh-Durham International Airport 1993-2002
Even static orthophotos are at risk.
Note: Percentages based on the actual number of respondents to each question 7
Today’s geospatial data as tomorrow’s cultural heritage
Future uses of data are difficult to anticipate (as with Sanborn Maps).
Note: Percentages based on the actual number of respondents to each question 8
NC Geospatial Data Archiving Project
Partnership between university library (NCSU) and state agency (NCCGIA), with Library of Congress under the National Digital Information Infrastructure and Preservation Program (NDIIPP)One of 8 initial NDIIPP partnerships (only state project)Focus on state and local geospatial content in North Carolina (state demonstration)Tied to NC OneMap initiative, which provides for seamless access to data, metadata, and inventoriesObjective: engage existing state/federal geospatial data infrastructures in preservation
Note: Percentages based on the actual number of respondents to each question 9
Targeted Content
Resource TypesGIS data (vector, etc.)Digital orthophotography Digital mapsTabular data (e.g. assessment data)
Content ProducersMostly state, local, regional agenciesSome university, not-for-profit, commercialSelected local federal projects
Note: Percentages based on the actual number of respondents to each question 10
Work plan in a Nutshell
Work from existing data inventories
NC OneMap Data Sharing Agreements as the “blanket”, individual agreements as the “quilt”
Partnership: work with existing geospatial data infrastructures (state and federal)
Technical approach
Metadata: FGDC, METS, PREMIS?, GeoDRM?
Repository-independent: Dspace initially
Web services consumption for archival development (in future?)
Note: Percentages based on the actual number of respondents to each question 11
NCGDAP Philosophy of Engagement
Take the dataas is, in the manner in whichit can be obtained
Provide feedback to producer organizations/inform state geospatial infrastructure
“Wrangle”and archivedata
Note the ‘Project’ in ‘North Carolina Geospatial Data ArchivingProject’– the process, the learning experience, and the engagementwith industry and infrastructure are more important than the archive
… What is the long term solution?
Note: Percentages based on the actual number of respondents to each question 12
Big Technical Challenges
Format migration paths
Management of data versions over time
Preservation metadata
Harnessing geospatial web services
Preserving cartographic representation
Keeping content repository-agnostic
Preserving geodatabases
More …
Note: Percentages based on the actual number of respondents to each question 13
Vector Data Format Issues
Vector data much more complicated than image data
‘Archiving’ vs. ‘Permanent access’An ‘open’ pile of XML might make an archive, but if using it requires a team of programmers to do digital archaeology then it does not provide permanent access
Piles of XML need to be widely understood piles
GML: need widely accepted application schemas (like OSMM?)
The Geodatabase conundrumExport feature classes, and lose topology, annotation, relationships, etc.
… or use the Geodatabase as the primary archival platform (some are now thinking this way)
Note: Percentages based on the actual number of respondents to each question 14
Managing Time-versioned Content
Continuously updated data: Frequency of snapshots?Different for various framework
layers?
Note: Percentages based on the actual number of respondents to each question 15
Metadata Availability – Limited at Local Level
February 2005
Note: Percentages based on the actual number of respondents to each question 16
Harnessing Geospatial Web Services
Image atlases from WMS services?Capturing cartographic representation?Recording records from decisions-making processes?Later: data transfer via WFS & GML?, Other?
Note: Percentages based on the actual number of respondents to each question 17
“Web mash-ups” and the New Mainstream Geospatial Web Services
How does temporal data fit into emerging WMS caching and tiling schemes?Capture of tiles and caches for archive?
Note: Percentages based on the actual number of respondents to each question 18
Preserving Cartographic Representation
Counterpart to the map is not just the dataset but also models, symbolization, classification, annotation, etc.
Note: Percentages based on the actual number of respondents to each question 19
Content replication also needed for:Disaster preparednessState and federal data improvement projectsAggregation by regional geospatial web service providers
WFS, e.g.: efficiency in complete content transfer?Need rsync-like function, informed by: rights management, inventory processes, metadata management, data update cyclesArchiving delta files vs. complete replication – need to avoid requiring “digital archaeology” in the future
Needed: Efficient Content Replication
Note: Percentages based on the actual number of respondents to each question 20
GML for archiving (PDF/A version of GML?)GeoDRM
Adding preservation use casesContent Packaging
Will there be an industry solution?Web Map Context Documents
Can we save data state as well as application state?Content Replication
Is this a layer in the overall architecture?Persistent Identifiers
Points of Engagement with the Open Geospatial Consortium (OGC)
Note: Percentages based on the actual number of respondents to each question 21
Framework data communitiesSnapshot frequency, naming schemes, classification, GML application schemas, format strategies
Metadata standards and outreachPersistent identifiers, versioning, feedback on metadata quality
Content replication/transferFor data improvement projects, disaster preparedness, aggregation by regional service providers, … and archives
Where does archiving and preservation fit into the NSDI, GOS, etc?
Points of Engagement with Spatial Data Infrastructure
Note: Percentages based on the actual number of respondents to each question 22
Software vendors
Better support for temporal data management
Tools for retrospective data conversion
Web mashup and open source communities
WMS caching schemes
Standard tiling schemes with temporal component?
Data vendors
Cultivate market for older data (scaled pricing?)
Tech transfer on archiving practices?
Points of Engagement with Industry
Note: Percentages based on the actual number of respondents to each question 23
Project StatusCultivating a market
for older data.
Note: Percentages based on the actual number of respondents to each question 24
Project StatusCultivating tools for
retrospective conversion.
Note: Percentages based on the actual number of respondents to each question 25
Demonstration archiveOutreach activity – planting seeds
International, national, state, local, commercial
Learning experience, informing:Spatial data infrastructureCommercial vendors (data/software/consulting)Repository software communitiesMetadata practice (both GIS & preservation)Rights management developmentsData and interoperability standards
Expected Project Outcomes
Note: Percentages based on the actual number of respondents to each question 26
Project Status
Storage system and backup deployedDSpace deployedFGDC Metadata workflow finalizedIngest workflow near finalizationContent migration workflow plan near finalizationRegional site visits planned for coming monthsWide range of outreach/collaboration: FGDC, ESRI, EDINA (JISC), USGS, OGC, TRB, etc.Pilot project, georegistering digital archival geologic maps
Note: Percentages based on the actual number of respondents to each question 27
Questions?
Contact:
Steve MorrisHead, Digital Library InitiativesNCSU [email protected]
Web site: http://www.lib.ncsu.edu/ncgdap/