1
Data “Bundle” Received/Retrieved from Agency Metadata Included? Agency/Layer Metadata Template Create Copy of Agency Provided Metadata Metadata Pre-processing Original Version Administrative Metadata Element Template Attach Administrative Components to Metadata Record Normative Metadata Record Ingest Processes Create Minimal Metadata Record NO YES Digital Data Time Series of Land Use Changes Many of the vector data layers to be acquired are subject to frequent update. County cadastral (land parcel) datasets, for example, are typically updated on a daily or weekly basis. Such time-versioned content, if preserved, can form the basis of time series analyses such as land use change analysis. Version-handling over time, however, can be quite difficult to manage within the archive. Not only does the management of a serial data set require close attention to storage structure and archival process controls, but experience in the content domain has shown that some resources only a few years old have already lived in two or three repository environments. Users may locate serial components not only separated from the parent collection in the archive, but also in a system environment incompatible with older components of the collection. Comparison of Wake County, NC parcels and street centerlines near Falls Lake and I-540 March 5, 1997 (left) and Feb. 22, 2005 (right). One Possible Approach One possible approach involves management of a serial level entity to which individual versions could point via a persistent identifier such as a Handle. The persistent identifier, by pointing to the serial record manager, could then form the basis for "get current object," "get current metadata," or "get current rights" requests. Volatile elements such as location of the current data version or associated map service could be managed in the serial record rather than the repository item and would also be available once the item leaves the repository. One drawback: this approach would assume a long-term commitment to maintenance of the serial object manager. On the other hand, this approach might simplify the task of version management within the repository and help avoid expensive customizations that risk the "imprinting" of the collection on that software environment. North Carolina Geospatial Data Archiving Project (NCGDAP) Key Issues and Findings in Work to Date (July 2005) Digital Geospatial Metadata Standards In the U.S., most GIS data and much remote sensing data are documented using the Federal Geographic Data Committee (FGDC) Content Standard for Digital Geospatial Metadata, which has been a federal standard since 1994. "FGDC Metadata," as it is often referred to, encompasses a broad range of descriptive, technical, and administrative elements. The collection of geospatial metadata was mandated in the U.S. by Executive Order 12906, which instructed federal agencies to document new geospatial data beginning in 1995. The standard was also widely adopted by state agencies as a means to expose content through the National Spatial Data Infrastructure (NSDI). Creation of FGDC metadata is less common at the local agency level. In North Carolina, the NC OneMap Metadata Outreach Program works to further the development of geospatial metadata at the county, city, and regional level. Presented by: Steve Morris, Rob Farrell, Jim Tuttle, Jeff Essic, and James Jackson Sanborn FALLS LAKE I-540 FALLS LAKE Technical Challenges In vector data, cartographic representation is typically encoded in proprietary files (.avl, .apr, .lyr, .mxd) that do not lend themselves well to conversion/migration and even face upward migration compatibility problems in their own software environments Image capture approaches that effectively capture the final map product in a stable, preservable form also involve discarding the underlying data and map intelligence Symbologies have meaning to particular communities at particular points in time, preserving information about symbol sets and their meaning is a different problem Possible Archival Approaches Image-based Approaches Vector-based Approaches The Added Value of Cartographic Representation The true counterpart to the old, preserved map is not the current GIS dataset but rather the cartographic representation that builds on that data. The representation is the result of a collection of intellectual choices and the application of current practice as regards symbolization, data modeling, annotation, etc. One goal of capturing cartographic representation would be to preserve data in the form that decision-makers and others encountered and interpreted it. Another goal, in the case of image capture approaches, would be to provide a stable, preservation-friendly (though “dumbed down”) alternative in the case of long-term failure in the vector data preservation process. The derived image might also serve as a content preview, helping future researchers decide whether to commit time and resources to do whatever “digital archeology” might be necessary to resurrect the content. Any preservation of cartographic representation would occur in addition to preserving the underlying data. Normalizing Metadata Upon Receipt/Retrieval Objective Normative metadata record to meet internal workflow requirements, project archival specifications, and compatibility with FGDC metadata content standard Concerns • Wide range of metadata receipt/retrieval scenarios • Metadata not always accurate representation of data set • Project requires minimally “complete” metadata record Primary Strategic Considerations • Retain original metadata record • Maintain FGDC content specifications and extend with ESRI Profile elements for additional technical metadata • Utilize available tools to maximize process efficiency Downstream Processing • Augment FGDC record with additional administrative and technical metadata and incorporate into METS • Map elements to DSpace ingest metadata • Ingest FGDC and METS records with data object Apex Cary Neuse Garner Raleigh New Hope Bethesda Morrisville t u 70 t u 1 t u 401 t u 64 t u 401 t u 70 t u 1 t u 401 t u 64 § ¨ ¦ 40 § ¨ ¦ 440 § ¨ ¦ 40 Raleigh-Durham International William B Umstead State Park William B Umstead State Park Carl Alwin Schenck Memorial Fo Carl Alwin Schenck Memorial Fo B. Everett Jordan Lake Lake Wheeler Falls Lake Reservoir Lake Benson C rabt re e Cre ek Neuse River Swift C r e e k Sw ift Creek North Carolina Vector data with Full Cartographic Representation Raleigh, North Carolina Vector data with Minimal Cartographic Representation Raleigh, North Carolina Archival Approach Issues Generation of images using map book tools, or harvest existing map books Results in a complex set of images linked through HTML grids. Compatible with capture of large extents. Automate capture of map book equivalents from Web Map Server (WMS) servers Potential for automated, agent-based capture. Availability of WMS services increasing. Export layout or map to image Does not provide the flexibility to handle large extents that the map book approach does, but does present the opportunity to capture agency-provided added value work. Archival Approach Issues Archive relevant proprietary files: .avl, .lyr, .mxd Proprietary map project and layer representation files tend to be less open and often lack upward compatibility across major software revisions. Store explicitly in Geodatabase (an option from ArcGIS version 9.2) Option not yet available. Still requires storing representation information in a proprietary format. Export to SVG or other open, XML-based option Piles of XML need to be well understood piles of XML in order to support permanent access (as opposed to requiring digital archaeology). Widely accepted profiles or standards would be needed. Findings: Geospatial Metadata “In the Wild” Issues: Managing Time-Versioned Content Issues: Preserving Cartographic Representation Findings: Spatial Database Technology Advancement and Proliferation Spatial Database Technology Overview A spatial database stores geographic features and attributes as objects hosted inside a relational database management system. Multiple data layers may be stored in a single database, which may also host elements such as topology, relationships, behaviors, and annotation that are not exportable to conventional vector file formats. Within the project domain, the ESRI Geodatabase format is a prominent example of this approach to data management. The initial project strategy with regard to the ESRI Geodatabase was rather cautious, focusing rather narrowly on export of data layers into more preservable vector formats. Evolution of the software functionality associated with the format is resulting in new preservation options, including XML export, Geodatabase replication, reduced dependency on proprietary database backends. Agency Use of ESRI Geodatabases Until recently, spatial databases were relatively rare in the project domain, but local agencies, especially municipalities, are increasingly turning to the ESRI Geodatabase format to manage geospatial data. According to the NC CGIA 2003 Local Government GIS Data Inventory, 10.0% of all county framework data and 32.7% of all municipal framework data were managed in that format. The charts depict selected results regarding Geodatabase proliferation in North Carolina local government. Project Stage Planned Approach Original Proposal (Nov. 2003) Export feature classes as shapefiles; archive Geodatabases less than 2 GB in size Finalized Work Plan (Dec. 2004) Also export content as Geodatabase XML Possible Future Work Plan Changes Explore maintenance of some archival content in Geodatabase form; explore Geodatabase replication as an archive development approach; archive Geodatabases of unlimited size Modifications to Project Geodatabase Strategy The publication of data layer-level metadata by local agencies is on the rise. However, as the map graphic indicates, metadata does not yet readily exist statewide for one of the project’s primary acquisition targets: cadastral data. Metadata availability is even lower for other key data layers such as street centerlines. The NC OneMap Metadata Outreach Program, led by the NC Center for Geographic Information and Analysis, is working to expand the availability of metadata at the local level. The project will encounter metadata in various forms and levels of completion relative to the FGDC Content Standard for Digital Geographic Metadata. Furthermore, profile extensions (such as ESRI's) to the FGDC standard have been implemented with some regularity, as indicated by the pie chart on the right, further complicating the metadata preparation process. Legend NCOneMap Metadata Outreach Metadata Published On-line Geospatial Metadata Publication in North Carolina County Cadastral Layer Information On-Line r 0 50 100 25 Miles Metadata Inclusion of ESRI Profile Elements Yes No Cities: Street Centerline Formats Geodatabase Shapefile Coverage Other Counties: Street Centerline Formats Geodatabase Shapefile Coverage Other

North Carolina Geospatial Data Archiving Project (NCGDAP ... · centerlines. The NC OneMap Metadata Outreach Program, led by the NC Center for Geographic Information and Analysis,

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: North Carolina Geospatial Data Archiving Project (NCGDAP ... · centerlines. The NC OneMap Metadata Outreach Program, led by the NC Center for Geographic Information and Analysis,

Data “Bundle”Received/Retrieved

from Agency

MetadataIncluded?

Agency/LayerMetadataTemplate

Create Copy ofAgency Provided

Metadata

Metadata Pre-processing

OriginalVersion

AdministrativeMetadataElementTemplate

Attach AdministrativeComponents to

Metadata Record

NormativeMetadataRecord

IngestProcesses

Create MinimalMetadata Record

NO

YES

Digital Data Time Series of Land Use Changes

Many of the vector data layers to be acquired are subject to frequent update. County cadastral (land parcel) datasets, for example, are typically updated on a daily or weekly basis. Such time-versioned content, if preserved, can form the basis of time series analyses such as land use change analysis.

Version-handling over time, however, can be quite difficult to manage within the archive. Not only does the management of a serial data set require close attention to storage structure and archival process controls, but experience in the content domainhas shown that some resources only a few years old have already lived in two or three repository environments. Users may locate serial components not only separated from the parent collection in the archive, but also in a system environment incompatible with older components of the collection.

Comparison of Wake County, NC parcels and street centerlines near Falls Lake and I-540March 5, 1997 (left) and Feb. 22, 2005 (right).

One Possible Approach

One possible approach involves management of a serial level entity to which individual versions could point via a persistent identifier such as a Handle. The persistent identifier, by pointing to the serial record manager, could then form the basis for "get current object," "get current metadata," or "get current rights" requests. Volatile elements such as location of the current data version or associated map service could be managed in the serial record rather than the repository item and would also be available once the item leaves the repository. One drawback: this approach would assume a long-term commitment to maintenance of the serial object manager. On the other hand, this approach might simplify the task of version management within the repository and help avoid expensive customizations that risk the "imprinting" of the collection on that software environment.

North Carolina Geospatial Data Archiving Project (NCGDAP)Key Issues and Findings in Work to Date (July 2005)

Digital Geospatial Metadata Standards

In the U.S., most GIS data and much remote sensing data are documented using the Federal Geographic Data Committee (FGDC) Content Standard for Digital Geospatial Metadata, which has been a federal standard since 1994. "FGDC Metadata," as it is often referred to, encompasses a broad range of descriptive, technical, and administrative elements.

The collection of geospatial metadata was mandated in the U.S. by Executive Order 12906, which instructed federal agencies to document new geospatial data beginning in 1995. The standard was also widely adopted by state agencies as a means to expose content through the National Spatial Data Infrastructure (NSDI). Creation of FGDC metadata is less common at the local agency level. In North Carolina, the NC OneMap Metadata Outreach Program works to further the development of geospatial metadata at the county, city, and regional level.

Presented by: Steve Morris, Rob Farrell, Jim Tuttle, Jeff Essic, and James Jackson Sanborn

FALLS LAKE

I-540

FALLS LAKE

Technical Challenges

• In vector data, cartographic representation is typically encoded in proprietary files (.avl, .apr, .lyr, .mxd) that do not lend themselves well to conversion/migration and even face upward migration compatibility problems in their own software environments • Image capture approaches that effectively capture the final map product in a stable, preservable form also involve discarding the underlying data and map intelligence • Symbologies have meaning to particular communities at particular points in time, preserving information about symbol sets and their meaning is a different problem

Possible Archival Approaches

Image-based Approaches

Vector-based Approaches

The Added Value of Cartographic Representation

The true counterpart to the old, preserved map is not the current GIS dataset but rather the cartographic representation that builds on that data. The representation is the result of a collection of intellectual choices and the application of current practice as regards symbolization, data modeling, annotation, etc. One goal of capturing cartographic representation would be to preserve data in the form that decision-makers and others encountered and interpreted it. Another goal, in the case of image capture approaches, would be to provide a stable, preservation-friendly (though “dumbed down”) alternative in the case of long-term failure in the vector data preservation process. The derived image might also serve as a content preview, helping future researchers decide whether to commit time and resources to do whatever “digital archeology” might be necessary to resurrect the content. Any preservation of cartographic representation would occur in addition to preserving the underlying data.

Normalizing Metadata Upon Receipt/Retrieval

ObjectiveNormative metadata record to meet internal workflow requirements, project archival specifications, and compatibilitywith FGDC metadata content standard

Concerns• Wide range of metadata receipt/retrieval scenarios• Metadata not always accurate representation of data set• Project requires minimally “complete” metadata record

Primary Strategic Considerations• Retain original metadata record• Maintain FGDC content specifications and extend with ESRI Profile elements for additional technical metadata• Utilize available tools to maximize process efficiency

Downstream Processing• Augment FGDC record with additional administrative and technical metadata and incorporate into METS• Map elements to DSpace ingest metadata• Ingest FGDC and METS records with data object

Apex

Cary

Neuse

Garner

Raleigh

New Hope

Bethesda

Morrisville

tu70

tu1

tu401

tu64

tu401

tu70

tu1

tu401

tu64

§̈¦40

§̈¦440

§̈¦40

Raleigh-Durham International

William B Umstead State ParkWilliam B Umstead State Park

Carl Alwin Schenck Memorial FoCarl Alwin Schenck Memorial Fo

B. Everett Jordan Lake

Lake Wheeler

Falls Lake Reservoir

Lake Benson

Crabtree Creek

Ne u

s e R

iv er

Swift Creek

Swift Creek

N o r t h C a r o l i n a

Vector data with Full Cartographic RepresentationRaleigh, North Carolina

Vector data with Minimal Cartographic RepresentationRaleigh, North Carolina

Archival Approach Issues

Generation of images using map book tools, or harvest existing map books

Results in a complex set of images linked through HTML grids. Compatible with capture of large extents.

Automate capture of map book equivalents from Web Map Server (WMS) servers

Potential for automated, agent-based capture. Availability of WMS services increasing.

Export layout or map to image Does not provide the flexibility to handle large extents that the map book approach does, but does present the opportunity to capture agency-provided added value work.

Archival Approach Issues

Archive relevant proprietary files: .avl, .lyr, .mxd Proprietary map project and layer representation files tend to be less open and often lack upward compatibility across major software revisions.

Store explicitly in Geodatabase (an option from ArcGIS version 9.2)

Option not yet available. Still requires storing representation information in a proprietary format.

Export to SVG or other open, XML-based option Piles of XML need to be well understood piles of XML in order to support permanent access (as opposed to requiring digital archaeology). Widely accepted profiles or standards would be needed.

Findings: Geospatial Metadata “In the Wild” Issues: Managing Time-Versioned Content Issues: Preserving Cartographic Representation

Findings: Spatial Database Technology Advancement and Proliferation

Spatial Database Technology Overview

A spatial database stores geographic features and attributes as objects hosted inside a relational database management system.

Multiple data layers may be stored in a single database, which may also host elements such as topology, relationships, behaviors, and

annotation that are not exportable to conventional vector file formats. Within the project domain, the ESRI Geodatabase format is a

prominent example of this approach to data management.

The initial project strategy with regard to the ESRI Geodatabase was rather cautious, focusing rather narrowly on export of data layers into

more preservable vector formats. Evolution of the software functionality associated with the format is resulting in new preservation

options, including XML export, Geodatabase replication, reduced dependency on proprietary database backends.

Agency Use of ESRI Geodatabases

Until recently, spatial databases were relatively rare in the project domain, but local agencies, especially municipalities, are increasingly turning to the ESRI Geodatabase format to manage geospatial data. According to the NC CGIA 2003 Local Government GIS Data Inventory, 10.0% of all county framework

data and 32.7% of all municipal framework data were managed in that format. The charts depict selected results regarding Geodatabase proliferation in North

Carolina local government.

Project Stage Planned Approach

Original Proposal (Nov. 2003) Export feature classes as shapefiles; archive Geodatabases less than 2 GB in size

Finalized Work Plan (Dec. 2004) Also export content as Geodatabase XML

Possible Future Work Plan Changes Explore maintenance of some archival content in Geodatabase form; explore Geodatabasereplication as an archive development approach; archive Geodatabases of unlimited size

Modifications to Project Geodatabase Strategy

The publication of data layer-level metadata by local agencies is on the rise.However, as the map graphic indicates, metadata does not yet readily existstatewide for one of the project’s primary acquisition targets: cadastral data.Metadata availability is even lower for other key data layers such as streetcenterlines. The NC OneMap Metadata Outreach Program, led by the NCCenter for Geographic Information and Analysis, is working to expand theavailability of metadata at the local level.

The project will encounter metadata in various forms and levels of completionrelative to the FGDC Content Standard for Digital Geographic Metadata.Furthermore, profile extensions (such as ESRI's) to the FGDC standard havebeen implemented with some regularity, as indicated by the pie chart on theright, further complicating the metadata preparation process.

LegendNCOneMap Metadata Outreach

Metadata Published On-line

Geospatial Metadata Publication in North CarolinaCounty Cadastral Layer Information On-Line

r0 50 10025 Miles

Metadata Inclusion of ESRI Profile Elements

Yes

No

Cities: Street Centerline Formats

GeodatabaseShapefileCoverageOther

Counties: Street Centerline Formats

GeodatabaseShapefileCoverageOther