Upload
clifford-arnold
View
224
Download
0
Tags:
Embed Size (px)
Citation preview
Next Generation Archives: The NC Geospatial Data Archiving Project
Jeff EssicGeospatial Data Services LibrarianNorth Carolina State University Libraries
NACIS 2008 October 10, 2008
2
NC Geospatial Data Archiving Project (NCGDAP)
Three year partnership between university library (NCSU) and state agency (NCCGIA), with Library of Congress under the National Digital Information Infrastructure and Preservation Program (NDIIPP)
One of 8 initial NDIIPP collection building partnerships
Focus on state and local geospatial content in North Carolina (state demonstration)
Tied to NC OneMap initiative, which provides for seamless access to data, metadata, and inventories
3
NCGDAP Specifics
Funding:$520,000 for 2005-2007$500,000 for 18 month extension
Staff:1.5 FTE at NCSUApprox. same at NCCGIA
Website: http://www.lib.ncsu.edu/ncgdap
4
Selected Geospatial Data Archive ProjectsProject Organizations Fundin
g
Persistent Archives Testbed
San Diego Supercomputer Center, NARA
NARA
VanMap San Diego Supercomputer Center
Inter-PARES
Geospatial Repository for Academic Deposit & Extraction
EDINA JISC
Geospatial Electronic Records
CIESIN NHPRC
various Carleton University various
National Geospatial Digital Archive
UC Santa Barbara NDIIPP
Maine GeoArchives State of Maine NHPRC
5
Tracking data, map servers, and web services since 2000
Ranked 3rd in traffic among entry points to entire library website
Persistent identifiersusage trackingID links used in other sites
Community help in site maintenance
Project Roots: NCSU Libraries Data Directory
6
0
10
20
30
40
50
60
70
80
90
100
2000 2001 2002 2003 2004 2005 2006 2007 2008
Nu
mb
er
of
Co
un
tie
s
Map Server
Data Download
WMS
100 Counties in North Carolina
County Map and Data Services in NC
Carrboro, NC : Population 17,797 (2005 est.)
24 downloadable GIS data layers
4 WMS data layers
6 web mapping applications
9 downloadable PDF map layers
8
Value in Older Data: Cultural Heritage
Future uses of data are difficult to anticipate (as with Sanborn Maps)
9
Downtown Raleigh Near State Capitol
1914 Sanborn Map
10
Downtown Raleigh Near State Capitol
1993 DOQQ
11
Downtown Raleigh Near State Capitol
1999 Wake County Ortho
12
Downtown Raleigh Near State Capitol
2005 Wake County Ortho
13
Downtown Raleigh Near State Capitol
2005 Wake County Ortho
Imagery = DurableStatic Simple structureMostly open formats
Vector data = VolatileFrequent updateComplex structureMostly proprietary formats
Downtown Raleigh Near State Capitol
2005 Wake County Ortho
Imagery = DurableStatic Simple structureMostly open formats
Vector data = VolatileFrequent updateComplex structureMostly proprietary formats
14
Geospatial Data Types – Cartographic
• GIS Software– Software project file (.mxd, .apr, …)– Data layer file (.avl, .lyr, …)
• PDF, GeoPDF map exports• Web Services-based representations
15
Geospatial Data Types – Spatial Databases
• Vector, raster, and tabular data
• Relationships• Behaviors• Annotation• Data Models
16
Other Geospatial Data Types – Place-based DataStreet Views Oblique Imagery
3D Images
Present-day value in location-based services and mobile applications
Future value for cultural heritage, descriptions of places
Tax Dept. Photos
17
Other Geospatial Data Types: Web 2.0 Mashups
18
Geospatial Data: Compelling Issues
Dynamic contentConstantly updated information
Data versioning
Digital object complexitySpatially enabled databases
Complicated, multi-component formats
Proprietary formats
19
Digital Preservation Points of Failure
Data is not saved, or …
can’t be found, or …
media is obsolete, or …
media is corrupt, or …
format is obsolete, or …
file is corrupt, or …
meaning is lost
20
Risks to Geospatial Data
Producer focus on current dataData overwrite as common practice
Future support of data formats in questionNo open, supported format for vector data
Shift to web services-based accessData becoming more ephemeral
Inadequate or nonexistent metadataImpedes discovery and use
Increasing use of spatial databases for data management
The whole is greater than the sum of the parts
21
Preservation Business Case
Land use change analysis
Site location analysis
Real estate trends analysis
Disaster response
Resolution of legal challenges
Impervious surface change mapping
22
Business Case: Identifying Land Use Changes
Use case: Land use and impervious surface change analysis
1993
2005
1998
2002
1999
23
24
Geospatial Data Preservation Challenges
Data CaptureBackups are common, but not long-term archives
Producer focus is on current data
Shift to web services-based access
Inadequate or Nonexistent MetadataConsistent NC survey stats: Only 40% of data producers create and maintain metadata
25
Challenge: Vector Data Formats
No widely-supported, open vector formats for geospatial data
Spatial Data Transfer Standard (SDTS) not widely supported
Geography Markup Language (GML) – diversity of application schemas and profiles a challenge for “permanent access”
Spatial DatabasesThe whole is more than the sum of the parts, and the whole is very difficult to preserve
Can export individual data layers for curation, but relationships and context are lost
26
Challenge: Digital Object Complexity
XML DatabaseExport
XML DatabaseExport
TIFF Images •Pixel Value and Header file•World file•Coordinate System file•Metadata file
Shapefiles•Geometry file•Index file•Attribute file•Metadata file•Coordinate System file•Spatial Index files
Potential Ingest Objects
Files
• Multi-file dataset• Georeferencing• Metadata file• Symbols file• Additional documentation• License• Disclaimer• More
Metadata
• FGDC• Acquisition metadata• Transfer metadata • Ingest metadata• Archive rights• Archive processes• Collection metadata• Series metadata
Metadata Exchange Format (MEF) in GeoNetwork a form of content packaging
27
Challenge: Cartographic Representation
Counterpart to the map is not just the dataset but also models, symbolization, classification, annotation, etc.
28
Other Challenges
Rights managementData versioningSemantic issuesContent PackagingLarge scale content transferIntegrating older analog materialsMore …
29
Different Ways to Approach Preservation
Technical solutions: How do we preserve acquired content over the long term?
Cultural/Organizational solutions: How do we make the data more preservable—and more prone to be preserved—from point of production?
Current use and data sharing requirements – not archiving needs – are most likely to drive improved preservability of content and improvement of metadata
30
Question: Frequency of Capture?
Content Exchange – Getting Data in Motion
Repository Development
Repository of Temporal Data Snapshots
31
Frequency of Capture
Issue: How frequently should county and municipal vector data layers be captured in archives?
Parcels, centerlines, jurisdictions, zoning, …
Parcel Boundary Changes 2001-2004, North Raleigh, NC
32
Frequency of Capture SurveysHow often should continually changing vector datasets be captured?Tap into data custodian understanding of production patterns and usesTap into local innovationLearn about local business drivers for data archiving
2006 and 2008 surveys of NC cities and counties2008 survey of archival practice in state agencies in NCPlanned survey of data users in NC
33
FOC 2006 Survey Results: Overview
58% response, two-thirds of whom create and retain periodic snapshotsLong-term retention more common in counties with larger populationsStorage environments vary, with servers and CD-ROMs most commonWide variation in frequencies of capture.Offsite storage (or both onsite and offsite) is used by nearly half of the respondentsPopularity of historic images has resulted in scanning and geo-referencing of hardcopy aerial photos among one-third of the respondents
34
Content Exchange Infrastructure
High volume of state/federal requests for local dataSolving the present-day problems of data sharing is a pre-requisite to solving the problem of long-term accessLeveraging more compelling business reasons to put the data in motion (disaster preparedness, business continuity, highway construction, census, …)Content exchange networks:
Minimize need to make contactAdd technical, administrative, descriptive metadataEstablish rights and provenance
35
Content Exchange Infrastructure
Nov. 2007: NC Geographic Information Coordinating Council (GICC):
Ten Recommendations in Support of Geospatial Data Sharing released
Recommendation: “Establish archive and long term data access strategies”Suggested best practices include: “Establish a policy and procedure for the provision of access to historic data, especially for framework data layers.”
http://www.ncgicc.org/CurrentActivities/TenRecommendationsinSupportofGeospatialData/tabid/156/Default.aspx
36
Getting the Data in Motion
Harvesting use cases for older data as part of outreach
Factors Driving Capture of Temporal Data
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
IT policy Recordsretention
policy
Tax adminrules
Land usechangeanalysis
Resolutionof legalissues
Historicmapping
Other
% o
f R
es
po
nd
en
ts
Survey of current archiving practice among NC counties and municipalities
37
Most costly part of archive development is identifying, negotiating acquisition, and then transferring data
Important ObjectivesMinimize Direct ContactDocument DataClarify RightsRoutinize Transfers
Leverage other business uses that put data in motion:
Continuity of operationsHighway PlanningFloodplain Mapping
Getting the Data in Motion
38
Getting the Data in Motion
OrthophotoData DistributionSystem – “sneakernet”
Transfer of large quantities of imagery
Street Centerline Data Distribution System
Efficient transfer of data from 100 counties, with metadata and clarified rightshttp://www.ncstreetmap.com
NC GIS Inventory
• Efficient data identification• Adding preservation elements
NC OneMap Data Download and Viewer
• Public access• Data visualization
39
Repository Development
Downloading or acquiring “low hanging fruit”
Tapping into current data flows
Developing our own metadata when necessary
Converting and preserving vector data in shapefile format
40
Data Preservation
Complex data representations can be made more preservable (yet less useful) through simplification.
Conversion of various formats to shp
Image outputs (web services,
PDF maps, map image files)
Very hard to preserve:Software project files
Symbol sets
What about symbology meanings?
Layer definitions
Web service or API interactions
41
Desiccated Data: PDF and GeoPDF
Cartographic outputs – analogous to paper maps
CombineDatasets
Data models
Classification
Symbolization
Annotation
More data intelligence
than in simple images
42
Desiccated Data: PDF and GeoPDF
Explosion of geospatial PDF content in past few years
Standards issuesGeoPDF: TerraGo technology has withdrawn patent claim and is approaching OGC about open standards process
PDF: open ISO standard with subset of geospatial functionality in ISO PDF standard part 2
Open PDF variants created through ISO standards process (PDF/E, PDF/X, PDF/A, …)
PDF content retained in addition to, NOT instead of data
43
Cartographic Preservation Side Project:
1:500,000 – 1:2.5 M1:31,680 – 1:430,000
1,200 – 24,000
Scanned, georeferenced, and compressed over 286 NC geologic maps, in cooperation with NC Geologic
Survey
44
Repository Status
Acquired 6+ TB of data with more on the way
Disk space being used initially for “data staging”Inventorying
In the process of ingesting content into DSpaceMetadata generation
45
Engaging Spatial Data Infrastructure
Cultural/Organizational solutions: How do we make the data more preservable—and more prone to be archived—from point of production?
Engage and outreach to the data producer community and SDISell the problem to software vendors and standards developmentFind overlap with more compelling business problems: disaster preparedness, business continuity, road building, etc.Discuss roles at the local, state, and federal level
46
Data inventories support content identification
Metadata standards support discoverability and use
Content standards support data interoperability over time and help eliminate semantic confusion
Data exchange networks:Minimize need to make contactAdd technical, administrative, descriptive metadataEstablish rights and provenance
SDI Role in Data Preservation
47
NC Spatial Data Infrastructure: NC OneMap
Next generation mechanism to coordinate and disseminate geographic information in North Carolina and interact with the NSDI.
NC GICC
Inventory for all geospatial data holdings – http://nc.gisinventory.net
Develop content standards for key data themes
One of the defined characteristics of
NC OneMap is that “Historic and
temporal data will be maintained
and available”.
48
Archival and Long Term Access Working Group
Initiated by NC Geographic Information Coordinating Council in 2008 to address growing concerns of state and local agencies about long-term access to dataFederal, state, regional, and local agency representationKey focus
Best practices for data snapshots and retentionState Archives processes: appraisal, selection, retention schedules, etc.
Valuable outcome of NCGDAP – multiple parties and levels discussing data archiving on their own.
49
Archival and Long Term Access Working Group
Final Report to be presented to GICC in Nov.
Best Practices for: Archiving Schedule
Inventory
Storage Medium
Formats
Naming
http://www.ncgicc.org/CurrentActivities/ArchivalandLongTermAccessadhocCommittee/tabid/306/Default.aspx
Metadata
Distribution
Periodic Review
Data Integrity
Publicity
50
NDIIPP Multi-State Geospatial Project
Lead organizations: North Carolina Center for Geographic Information & Analysis (NCCGIA) and State Archives of NC
Partners:Leading state geospatial organizations of Kentucky and Utah
State Archives of Kentucky and Utah
NCSU Libraries in catalytic/advisory role
State-to-state and geo-to-Archives collaboration
2 year project: Nov. 2007-Dec. 2009
Archives as part of Spatial Data Infrastructure
51
OGC Data Preservation Working Group
Formed Dec. 2006
Engage archival community
Find points of intersection with other OGC activities:
GML for archiving
Content packaging
Large scale data transfers
Time in decision support
52
Cultural: Changing Industry Thinking
Is the geospatial industry “temporally-impaired?”Lack of access to older dataLack for tool/model support for temporal analysisMetadata: poor support for changing dataEducation: building class projects around available data (i.e., not temporal)
Increased interest now in temporal applications?Increased demand for temporal data?Improved tool support: ArcGIS 9.2 animation tools; Geodatabase History, etc.Emerging commercial market in older data
53
“Supporting temporal analysis requirements” gets more attention than “archiving and preservation”
Leverage existing infrastructure
Current data sharing needs drive infrastructure improvements that help archiving
Leverage business needs that are more compelling than preservation (e.g., continuity of operations)
Facilitate stakeholder ownership of the solutions
Mine state and local archiving innovations
Conclusions
54
Slide Presentation:
http://www.lib.ncsu.edu/ncgdap/presentations.html
Steve Morris Jeff EssicHead, Digital Library Initiatives Geospatial Data Services LibrarianNCSU Libraries NCSU Librariesph: (919) 515-1361 ph: (919) [email protected] [email protected]