54
Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries NACIS 2008 October 10, 2008

Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

Embed Size (px)

Citation preview

Page 1: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

Next Generation Archives: The NC Geospatial Data Archiving Project

Jeff EssicGeospatial Data Services LibrarianNorth Carolina State University Libraries

NACIS 2008 October 10, 2008

Page 2: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

2

NC Geospatial Data Archiving Project (NCGDAP)

Three year partnership between university library (NCSU) and state agency (NCCGIA), with Library of Congress under the National Digital Information Infrastructure and Preservation Program (NDIIPP)

One of 8 initial NDIIPP collection building partnerships

Focus on state and local geospatial content in North Carolina (state demonstration)

Tied to NC OneMap initiative, which provides for seamless access to data, metadata, and inventories

Page 3: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

3

NCGDAP Specifics

Funding:$520,000 for 2005-2007$500,000 for 18 month extension

Staff:1.5 FTE at NCSUApprox. same at NCCGIA

Website: http://www.lib.ncsu.edu/ncgdap

Page 4: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

4

Selected Geospatial Data Archive ProjectsProject Organizations Fundin

g

Persistent Archives Testbed

San Diego Supercomputer Center, NARA

NARA

VanMap San Diego Supercomputer Center

Inter-PARES

Geospatial Repository for Academic Deposit & Extraction

EDINA JISC

Geospatial Electronic Records

CIESIN NHPRC

various Carleton University various

National Geospatial Digital Archive

UC Santa Barbara NDIIPP

Maine GeoArchives State of Maine NHPRC

Page 5: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

5

Tracking data, map servers, and web services since 2000

Ranked 3rd in traffic among entry points to entire library website

Persistent identifiersusage trackingID links used in other sites

Community help in site maintenance

Project Roots: NCSU Libraries Data Directory

Page 6: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

6

0

10

20

30

40

50

60

70

80

90

100

2000 2001 2002 2003 2004 2005 2006 2007 2008

Nu

mb

er

of

Co

un

tie

s

Map Server

Data Download

WMS

100 Counties in North Carolina

County Map and Data Services in NC

Page 7: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

Carrboro, NC : Population 17,797 (2005 est.)

24 downloadable GIS data layers

4 WMS data layers

6 web mapping applications

9 downloadable PDF map layers

Page 8: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

8

Value in Older Data: Cultural Heritage

Future uses of data are difficult to anticipate (as with Sanborn Maps)

Page 9: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

9

Downtown Raleigh Near State Capitol

1914 Sanborn Map

Page 10: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

10

Downtown Raleigh Near State Capitol

1993 DOQQ

Page 11: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

11

Downtown Raleigh Near State Capitol

1999 Wake County Ortho

Page 12: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

12

Downtown Raleigh Near State Capitol

2005 Wake County Ortho

Page 13: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

13

Downtown Raleigh Near State Capitol

2005 Wake County Ortho

Imagery = DurableStatic Simple structureMostly open formats

Vector data = VolatileFrequent updateComplex structureMostly proprietary formats

Downtown Raleigh Near State Capitol

2005 Wake County Ortho

Imagery = DurableStatic Simple structureMostly open formats

Vector data = VolatileFrequent updateComplex structureMostly proprietary formats

Page 14: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

14

Geospatial Data Types – Cartographic

• GIS Software– Software project file (.mxd, .apr, …)– Data layer file (.avl, .lyr, …)

• PDF, GeoPDF map exports• Web Services-based representations

Page 15: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

15

Geospatial Data Types – Spatial Databases

• Vector, raster, and tabular data

• Relationships• Behaviors• Annotation• Data Models

Page 16: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

16

Other Geospatial Data Types – Place-based DataStreet Views Oblique Imagery

3D Images

Present-day value in location-based services and mobile applications

Future value for cultural heritage, descriptions of places

Tax Dept. Photos

Page 17: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

17

Other Geospatial Data Types: Web 2.0 Mashups

Page 18: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

18

Geospatial Data: Compelling Issues

Dynamic contentConstantly updated information

Data versioning

Digital object complexitySpatially enabled databases

Complicated, multi-component formats

Proprietary formats

Page 19: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

19

Digital Preservation Points of Failure

Data is not saved, or …

can’t be found, or …

media is obsolete, or …

media is corrupt, or …

format is obsolete, or …

file is corrupt, or …

meaning is lost

Page 20: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

20

Risks to Geospatial Data

Producer focus on current dataData overwrite as common practice

Future support of data formats in questionNo open, supported format for vector data

Shift to web services-based accessData becoming more ephemeral

Inadequate or nonexistent metadataImpedes discovery and use

Increasing use of spatial databases for data management

The whole is greater than the sum of the parts

Page 21: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

21

Preservation Business Case

Land use change analysis

Site location analysis

Real estate trends analysis

Disaster response

Resolution of legal challenges

Impervious surface change mapping

Page 22: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

22

Business Case: Identifying Land Use Changes

Use case: Land use and impervious surface change analysis

1993

2005

1998

2002

1999

Page 23: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

23

Page 24: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

24

Geospatial Data Preservation Challenges

Data CaptureBackups are common, but not long-term archives

Producer focus is on current data

Shift to web services-based access

Inadequate or Nonexistent MetadataConsistent NC survey stats: Only 40% of data producers create and maintain metadata

Page 25: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

25

Challenge: Vector Data Formats

No widely-supported, open vector formats for geospatial data

Spatial Data Transfer Standard (SDTS) not widely supported

Geography Markup Language (GML) – diversity of application schemas and profiles a challenge for “permanent access”

Spatial DatabasesThe whole is more than the sum of the parts, and the whole is very difficult to preserve

Can export individual data layers for curation, but relationships and context are lost

Page 26: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

26

Challenge: Digital Object Complexity

XML DatabaseExport

XML DatabaseExport

TIFF Images •Pixel Value and Header file•World file•Coordinate System file•Metadata file

Shapefiles•Geometry file•Index file•Attribute file•Metadata file•Coordinate System file•Spatial Index files

Potential Ingest Objects

Files

• Multi-file dataset• Georeferencing• Metadata file• Symbols file• Additional documentation• License• Disclaimer• More

Metadata

• FGDC• Acquisition metadata• Transfer metadata • Ingest metadata• Archive rights• Archive processes• Collection metadata• Series metadata

Metadata Exchange Format (MEF) in GeoNetwork a form of content packaging

Page 27: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

27

Challenge: Cartographic Representation

Counterpart to the map is not just the dataset but also models, symbolization, classification, annotation, etc.

Page 28: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

28

Other Challenges

Rights managementData versioningSemantic issuesContent PackagingLarge scale content transferIntegrating older analog materialsMore …

Page 29: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

29

Different Ways to Approach Preservation

Technical solutions: How do we preserve acquired content over the long term?

Cultural/Organizational solutions: How do we make the data more preservable—and more prone to be preserved—from point of production?

Current use and data sharing requirements – not archiving needs – are most likely to drive improved preservability of content and improvement of metadata

Page 30: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

30

Question: Frequency of Capture?

Content Exchange – Getting Data in Motion

Repository Development

Repository of Temporal Data Snapshots

Page 31: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

31

Frequency of Capture

Issue: How frequently should county and municipal vector data layers be captured in archives?

Parcels, centerlines, jurisdictions, zoning, …

Parcel Boundary Changes 2001-2004, North Raleigh, NC

Page 32: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

32

Frequency of Capture SurveysHow often should continually changing vector datasets be captured?Tap into data custodian understanding of production patterns and usesTap into local innovationLearn about local business drivers for data archiving

2006 and 2008 surveys of NC cities and counties2008 survey of archival practice in state agencies in NCPlanned survey of data users in NC

Page 33: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

33

FOC 2006 Survey Results: Overview

58% response, two-thirds of whom create and retain periodic snapshotsLong-term retention more common in counties with larger populationsStorage environments vary, with servers and CD-ROMs most commonWide variation in frequencies of capture.Offsite storage (or both onsite and offsite) is used by nearly half of the respondentsPopularity of historic images has resulted in scanning and geo-referencing of hardcopy aerial photos among one-third of the respondents

Page 34: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

34

Content Exchange Infrastructure

High volume of state/federal requests for local dataSolving the present-day problems of data sharing is a pre-requisite to solving the problem of long-term accessLeveraging more compelling business reasons to put the data in motion (disaster preparedness, business continuity, highway construction, census, …)Content exchange networks:

Minimize need to make contactAdd technical, administrative, descriptive metadataEstablish rights and provenance

Page 35: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

35

Content Exchange Infrastructure

Nov. 2007: NC Geographic Information Coordinating Council (GICC):

Ten Recommendations in Support of Geospatial Data Sharing released

Recommendation: “Establish archive and long term data access strategies”Suggested best practices include: “Establish a policy and procedure for the provision of access to historic data, especially for framework data layers.”

http://www.ncgicc.org/CurrentActivities/TenRecommendationsinSupportofGeospatialData/tabid/156/Default.aspx

Page 36: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

36

Getting the Data in Motion

Harvesting use cases for older data as part of outreach

Factors Driving Capture of Temporal Data

0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

IT policy Recordsretention

policy

Tax adminrules

Land usechangeanalysis

Resolutionof legalissues

Historicmapping

Other

% o

f R

es

po

nd

en

ts

Survey of current archiving practice among NC counties and municipalities

Page 37: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

37

Most costly part of archive development is identifying, negotiating acquisition, and then transferring data

Important ObjectivesMinimize Direct ContactDocument DataClarify RightsRoutinize Transfers

Leverage other business uses that put data in motion:

Continuity of operationsHighway PlanningFloodplain Mapping

Getting the Data in Motion

Page 38: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

38

Getting the Data in Motion

OrthophotoData DistributionSystem – “sneakernet”

Transfer of large quantities of imagery

Street Centerline Data Distribution System

Efficient transfer of data from 100 counties, with metadata and clarified rightshttp://www.ncstreetmap.com

NC GIS Inventory

• Efficient data identification• Adding preservation elements

NC OneMap Data Download and Viewer

• Public access• Data visualization

Page 39: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

39

Repository Development

Downloading or acquiring “low hanging fruit”

Tapping into current data flows

Developing our own metadata when necessary

Converting and preserving vector data in shapefile format

Page 40: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

40

Data Preservation

Complex data representations can be made more preservable (yet less useful) through simplification.

Conversion of various formats to shp

Image outputs (web services,

PDF maps, map image files)

Very hard to preserve:Software project files

Symbol sets

What about symbology meanings?

Layer definitions

Web service or API interactions

Page 41: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

41

Desiccated Data: PDF and GeoPDF

Cartographic outputs – analogous to paper maps

CombineDatasets

Data models

Classification

Symbolization

Annotation

More data intelligence

than in simple images

Page 42: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

42

Desiccated Data: PDF and GeoPDF

Explosion of geospatial PDF content in past few years

Standards issuesGeoPDF: TerraGo technology has withdrawn patent claim and is approaching OGC about open standards process

PDF: open ISO standard with subset of geospatial functionality in ISO PDF standard part 2

Open PDF variants created through ISO standards process (PDF/E, PDF/X, PDF/A, …)

PDF content retained in addition to, NOT instead of data

Page 43: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

43

Cartographic Preservation Side Project:

1:500,000 – 1:2.5 M1:31,680 – 1:430,000

1,200 – 24,000

Scanned, georeferenced, and compressed over 286 NC geologic maps, in cooperation with NC Geologic

Survey

Page 44: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

44

Repository Status

Acquired 6+ TB of data with more on the way

Disk space being used initially for “data staging”Inventorying

In the process of ingesting content into DSpaceMetadata generation

Page 45: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

45

Engaging Spatial Data Infrastructure

Cultural/Organizational solutions: How do we make the data more preservable—and more prone to be archived—from point of production?

Engage and outreach to the data producer community and SDISell the problem to software vendors and standards developmentFind overlap with more compelling business problems: disaster preparedness, business continuity, road building, etc.Discuss roles at the local, state, and federal level

Page 46: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

46

Data inventories support content identification

Metadata standards support discoverability and use

Content standards support data interoperability over time and help eliminate semantic confusion

Data exchange networks:Minimize need to make contactAdd technical, administrative, descriptive metadataEstablish rights and provenance

SDI Role in Data Preservation

Page 47: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

47

NC Spatial Data Infrastructure: NC OneMap

Next generation mechanism to coordinate and disseminate geographic information in North Carolina and interact with the NSDI.

NC GICC

Inventory for all geospatial data holdings – http://nc.gisinventory.net

Develop content standards for key data themes

One of the defined characteristics of

NC OneMap is that “Historic and

temporal data will be maintained

and available”.

Page 48: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

48

Archival and Long Term Access Working Group

Initiated by NC Geographic Information Coordinating Council in 2008 to address growing concerns of state and local agencies about long-term access to dataFederal, state, regional, and local agency representationKey focus

Best practices for data snapshots and retentionState Archives processes: appraisal, selection, retention schedules, etc.

Valuable outcome of NCGDAP – multiple parties and levels discussing data archiving on their own.

Page 49: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

49

Archival and Long Term Access Working Group

Final Report to be presented to GICC in Nov.

Best Practices for: Archiving Schedule

Inventory

Storage Medium

Formats

Naming

http://www.ncgicc.org/CurrentActivities/ArchivalandLongTermAccessadhocCommittee/tabid/306/Default.aspx

Metadata

Distribution

Periodic Review

Data Integrity

Publicity

Page 50: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

50

NDIIPP Multi-State Geospatial Project

Lead organizations: North Carolina Center for Geographic Information & Analysis (NCCGIA) and State Archives of NC

Partners:Leading state geospatial organizations of Kentucky and Utah

State Archives of Kentucky and Utah

NCSU Libraries in catalytic/advisory role

State-to-state and geo-to-Archives collaboration

2 year project: Nov. 2007-Dec. 2009

Archives as part of Spatial Data Infrastructure

Page 51: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

51

OGC Data Preservation Working Group

Formed Dec. 2006

Engage archival community

Find points of intersection with other OGC activities:

GML for archiving

Content packaging

Large scale data transfers

Time in decision support

Page 52: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

52

Cultural: Changing Industry Thinking

Is the geospatial industry “temporally-impaired?”Lack of access to older dataLack for tool/model support for temporal analysisMetadata: poor support for changing dataEducation: building class projects around available data (i.e., not temporal)

Increased interest now in temporal applications?Increased demand for temporal data?Improved tool support: ArcGIS 9.2 animation tools; Geodatabase History, etc.Emerging commercial market in older data

Page 53: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

53

“Supporting temporal analysis requirements” gets more attention than “archiving and preservation”

Leverage existing infrastructure

Current data sharing needs drive infrastructure improvements that help archiving

Leverage business needs that are more compelling than preservation (e.g., continuity of operations)

Facilitate stakeholder ownership of the solutions

Mine state and local archiving innovations

Conclusions

Page 54: Next Generation Archives: The NC Geospatial Data Archiving Project Jeff Essic Geospatial Data Services Librarian North Carolina State University Libraries

54

Slide Presentation:

http://www.lib.ncsu.edu/ncgdap/presentations.html

Steve Morris Jeff EssicHead, Digital Library Initiatives Geospatial Data Services LibrarianNCSU Libraries NCSU Librariesph: (919) 515-1361 ph: (919) [email protected] [email protected]