17
ESRI User Conference, August 8, 2006 Long-term archiving of geospatial data: the NGDA project Julie Sweetkind- Singer John Banning Stanford University

ESRI User Conference, August 8, 2006 Long-term archiving of geospatial data: the NGDA project Julie Sweetkind-Singer John Banning Stanford University

Embed Size (px)

Citation preview

Page 1: ESRI User Conference, August 8, 2006 Long-term archiving of geospatial data: the NGDA project Julie Sweetkind-Singer John Banning Stanford University

ESRI User Conference, August 8, 2006

Long-term archiving of geospatial data: the NGDA project

Julie Sweetkind-Singer

John Banning

Stanford University

Page 2: ESRI User Conference, August 8, 2006 Long-term archiving of geospatial data: the NGDA project Julie Sweetkind-Singer John Banning Stanford University

ESRI User Conference, August 8, 2006

The Library of Congress and NDIIPP

$100 million from Congress, Dec. 2000. 1st round of funding announced Sept. 30,

2004.– 8 grants funded for nearly $14 million.

2nd round of funding announced May 6, 2005.– 10 awards totaling $3 million (in conjunction with

NSF).

Page 3: ESRI User Conference, August 8, 2006 Long-term archiving of geospatial data: the NGDA project Julie Sweetkind-Singer John Banning Stanford University

ESRI User Conference, August 8, 2006

Funded Geospatial Projects

North Carolina State University: preservation of geospatial data from state and government agencies in North Carolina Main partner: North Carolina Center for Geographic Information & Analysis

University of California at Santa Barbara: formation of a national geospatial federated digital repository Main partner: Stanford University

Total of both awards: $3.1 million

Page 4: ESRI User Conference, August 8, 2006 Long-term archiving of geospatial data: the NGDA project Julie Sweetkind-Singer John Banning Stanford University

ESRI User Conference, August 8, 2006

What is meant by digital preservation?

“Reliable long-term access to managed digital resources to its designated communities, now and in the future.” (RLG/OCLC, 2002)

Trusted digital repository attributes

Page 5: ESRI User Conference, August 8, 2006 Long-term archiving of geospatial data: the NGDA project Julie Sweetkind-Singer John Banning Stanford University

ESRI User Conference, August 8, 2006

Key non-technical elements

Collection development– Assessing scope– Assessing risk

Contracts– Rights / use of materials

Cost of acquiring data. Increasing the size of

the collecting network.

Page 6: ESRI User Conference, August 8, 2006 Long-term archiving of geospatial data: the NGDA project Julie Sweetkind-Singer John Banning Stanford University

ESRI User Conference, August 8, 2006

Key technical elements

Large data sets Versioning Variety and complexity of

formats Proprietary file formats Need for format infor-

mation and specifications Federation

Page 7: ESRI User Conference, August 8, 2006 Long-term archiving of geospatial data: the NGDA project Julie Sweetkind-Singer John Banning Stanford University

ESRI User Conference, August 8, 2006

External contacts

California Spatial Information Library (CASIL) David Rumsey Collection California Geological Survey Katrina Image Warehouse Digital Globe and GeoEye ESRI

Page 8: ESRI User Conference, August 8, 2006 Long-term archiving of geospatial data: the NGDA project Julie Sweetkind-Singer John Banning Stanford University

ESRI User Conference, August 8, 2006

Technical Architecture-Stanford

Page 9: ESRI User Conference, August 8, 2006 Long-term archiving of geospatial data: the NGDA project Julie Sweetkind-Singer John Banning Stanford University

ESRI User Conference, August 8, 2006

Technical Architecture-UCSB

storage subsystem

standard, public data model

archival system

ADL OAIbulk

loader

databases,caches,

etc.

Web

access ingest

Page 10: ESRI User Conference, August 8, 2006 Long-term archiving of geospatial data: the NGDA project Julie Sweetkind-Singer John Banning Stanford University

ESRI User Conference, August 8, 2006

What is a format?

“A serialization of an abstract information model”– A set of syntactic and semantic rules for mapping

from an information model to a byte stream (and, in most instances, for mapping back).

Without knowledge of its format, a digital object is merely a collection of undifferentiated bits.

Page 11: ESRI User Conference, August 8, 2006 Long-term archiving of geospatial data: the NGDA project Julie Sweetkind-Singer John Banning Stanford University

ESRI User Conference, August 8, 2006

What is a Format Registry?

Definition– The registry is a central location where information is stored and

maintained in a controlled method. – This includes: Identifiers, Responsibility, Classification,

Relationships, Specifications, Signatures, Grammar, Tools, and Assessment

Why do we need one? Formats become obsolete over time Need machine actionable validation of the format.

Page 12: ESRI User Conference, August 8, 2006 Long-term archiving of geospatial data: the NGDA project Julie Sweetkind-Singer John Banning Stanford University

ESRI User Conference, August 8, 2006

Goals of a Format Registry

Interpret the information content of that object properly.

Effective use, interchange, and preservation of all digitally-encoded content.

Page 13: ESRI User Conference, August 8, 2006 Long-term archiving of geospatial data: the NGDA project Julie Sweetkind-Singer John Banning Stanford University

ESRI User Conference, August 8, 2006

Current Efforts in Format Registries

Global Digital Format Registry (GDFR) Digital Formats Web (Library of Congress) PRONOM (UK) NGDA (geospatial) Long Now Foundation

Page 14: ESRI User Conference, August 8, 2006 Long-term archiving of geospatial data: the NGDA project Julie Sweetkind-Singer John Banning Stanford University

ESRI User Conference, August 8, 2006

Geospatial Example: ESRI Shapefile

1. ESRI Shapefile Technical Description white paper

2. dBase specification

3. Reference to different geospatial metadata standards

4. Additional documentation, specifications or statements on the various files that may be used as part of shapefiles (.sbn, .sbx, .prj, . xml, .fbn, .fbx)

Page 15: ESRI User Conference, August 8, 2006 Long-term archiving of geospatial data: the NGDA project Julie Sweetkind-Singer John Banning Stanford University

ESRI User Conference, August 8, 2006

Geospatial Example: ESRI Shapefile

5. Identifiers – “.shp”

6. Responsibility – ESRI 380 New York Street Redlands, CA  92373

7. Tools – ArcGIS, ArcView 3.0, etc.

Link to existing Format Registry: http://www.ngda.org/format/

Page 16: ESRI User Conference, August 8, 2006 Long-term archiving of geospatial data: the NGDA project Julie Sweetkind-Singer John Banning Stanford University

ESRI User Conference, August 8, 2006

Goals of the Project

Create robust preservation environments

Save at-risk data Write collection development

policy Start a geospatial format registry Develop guidelines for

preservation of geospatial materials

Agree upon guidelines for participation in the NGDA

Page 17: ESRI User Conference, August 8, 2006 Long-term archiving of geospatial data: the NGDA project Julie Sweetkind-Singer John Banning Stanford University

ESRI User Conference, August 8, 2006

Relevant contact information

Julie Sweetkind-Singer– [email protected]

John Banning– [email protected]

NGDA Web site– www.ngda.org

NDIIPP Web site– www.digitalpreservation.gov