28
A Data Model and Architecture for Long-term Preservation Greg Janée, Justin Mathena, James Frew University of California at Santa Barbara

A Data Model and Architecture for Long-term Preservation

  • Upload
    livana

  • View
    52

  • Download
    2

Embed Size (px)

DESCRIPTION

A Data Model and Architecture for Long-term Preservation. Greg Janée, Justin Mathena, James Frew University of California at Santa Barbara. Outline. Project overview Character of geospatial data Observations on preservation requirements Architecture Ongoing work. Project overview. - PowerPoint PPT Presentation

Citation preview

Page 1: A Data Model and Architecture for Long-term Preservation

A Data Model and Architecture for Long-term Preservation

Greg Janée, Justin Mathena, James FrewUniversity of California at Santa Barbara

Page 2: A Data Model and Architecture for Long-term Preservation

Greg Janée • JCDL 2008 2

Outline

• Project overview• Character of geospatial data• Observations on preservation

– requirements

• Architecture• Ongoing work

Page 3: A Data Model and Architecture for Long-term Preservation

Greg Janée • JCDL 2008 3

Project overview

• National Geospatial Digital Archive (NGDA)– UCSB (Map & Imagery Laboratory)– Stanford (Branner Earth Sciences Library)

• Funded by Library of Congress’s NDIIPP program

How to achieve long-term preservationof geospatial data on a national scale?

Page 4: A Data Model and Architecture for Long-term Preservation

Greg Janée • JCDL 2008 4

Geospatial data characteristics

• Voluminous• Sensor platforms are long-lived• Highly structured

– support not ubiquitous

• Requires specialized interpretation• Tied to Earth models

Page 5: A Data Model and Architecture for Long-term Preservation

Greg Janée • JCDL 2008 5

Starting point

content

now

takeaction

now+

100 years

Page 6: A Data Model and Architecture for Long-term Preservation

Greg Janée • JCDL 2008 6

Preservation: relay across time

repository system

now now+

100 years

storage system

institution

Page 7: A Data Model and Architecture for Long-term Preservation

Greg Janée • JCDL 2008 7

Preservation: relay across time

repository system

now now+

100 years

storage system

institution

Requirement

Each archive facilitates handoff to the next

Page 8: A Data Model and Architecture for Long-term Preservation

Greg Janée • JCDL 2008 8

Mid-century perspective

oldcontent

now - 50 now + 50now

takeaction

contentancientcontent

Page 9: A Data Model and Architecture for Long-term Preservation

Greg Janée • JCDL 2008 9

Mid-century perspective

oldcontent

now - 50 now + 50now

takeaction

contentancientcontent

Requirement

Each archive facilitates handoff to the next

... on unfamiliar content

... such that the next archive can make the same claim

Page 10: A Data Model and Architecture for Long-term Preservation

Greg Janée • JCDL 2008 10

Preservation: mitigation of risk

• Preservation is an outcome• Risk: insufficient resources and/or desire• Risk: handoff

– e.g., from failing institution– e.g., from unsupported repository system

Page 11: A Data Model and Architecture for Long-term Preservation

Greg Janée • JCDL 2008 11

Preservation: mitigation of risk

• Preservation is an outcome• Risk: insufficient resources and/or desire• Risk: handoff

– e.g., from failing institution– e.g., from unsupported repository systemRequirement

Each archive supports a low-cost, robust “fallback” preservation mode

Page 12: A Data Model and Architecture for Long-term Preservation

Greg Janée • JCDL 2008 12

computing platformsemantics

terminologyprovenance

providerquality

appropriate usagecommunity

context

context

capture

object(data + metadata)

object(data + metadata)

2008

object(data + metadata)

object(data + metadata)

2108

Preservation: context

objectobject migrate

Page 13: A Data Model and Architecture for Long-term Preservation

Greg Janée • JCDL 2008 13

Geospatial data context

• Complex– sensor, platform characteristics

• In practice, not handled as metadata• Deep understanding of provenance required

– to support reprocessing

Page 14: A Data Model and Architecture for Long-term Preservation

Greg Janée • JCDL 2008 14

Ozone reprocessing requirements• xDRs• Delivered IPs• Engineering data (incl. C3S data if not in RDRs)• Upload files• Databases• Software (source code)• Calibration artifacts

– data– analysis tools– tables– logs– notebooks– instrument design

• All project documentation• All scientific papers• All reports

Taken from: Mike Linda, “OMPS Aggregation and Packaging,”2006 CLASS Users’ Workshop

Page 15: A Data Model and Architecture for Long-term Preservation

Greg Janée • JCDL 2008 15

Ozone reprocessing requirements• xDRs• Delivered IPs• Engineering data (incl. C3S data if not in RDRs)• Upload files• Databases• Software (source code)• Calibration artifacts

– data– analysis tools– tables– logs– notebooks– instrument design

• All project documentation• All scientific papers• All reports

Taken from: Mike Linda, “OMPS Aggregation and Packaging,”2006 CLASS Users’ Workshop

Requirement

Context must be preserved

... and context must accommodate complex networks of objects

Page 16: A Data Model and Architecture for Long-term Preservation

Greg Janée • JCDL 2008 16

Architecture

archivemanagement, policies, services, access

domain-specific

logical data modelstandard packaging of data, semantics

physical data modelsurvivable, vendor-neutral representation of above

bestpractices/

interopstandard

storage virtualization layerseamless movement, reliability, redundancy

interopstandard

Page 17: A Data Model and Architecture for Long-term Preservation

Greg Janée • JCDL 2008 17

• Logical data model captures all information required to resurrect, reuse objects

• Includes archival of format specs, metadata, contextual information, transitive closure thereof

• NGDA: archival objects

Architecture

archive

logicaldata model

physicaldata model

storage virtualization layer

domain-specific

interop

bestpractices/

interop

Page 18: A Data Model and Architecture for Long-term Preservation

Greg Janée • JCDL 2008 18

• Physical data model fully and simply represents logical data model

• No vendor lock-in• NGDA: files, filesystems,

XML manifests

Architecture

archive

logicaldata model

physicaldata model

storage virtualization layerinterop

domain-specific

bestpractices/

interop

Page 19: A Data Model and Architecture for Long-term Preservation

Greg Janée • JCDL 2008 19

• Storage virtualization layer supports intra- and inter-archive handoffs

• NGDA: “logistical networking”

Architecture

archive

logicaldata model

physicaldata model

storage virtualization layerinterop

domain-specific

bestpractices/

interop

Page 20: A Data Model and Architecture for Long-term Preservation

Greg Janée • JCDL 2008 20

• Combination of complete resurrection information with a simple physical representation provides fallback mechanism

Architecture: fallback

archive

logicaldata model

physicaldata model

storage virtualization layerinterop

domain-specific

bestpractices/

interop

Page 21: A Data Model and Architecture for Long-term Preservation

Greg Janée • JCDL 2008 21

Architecture: handoffs

archiveexport ingest

archive

logicaldata model

physicaldata model

storage virtualization layer

logicaldata model

physicaldata model

Page 22: A Data Model and Architecture for Long-term Preservation

Greg Janée • JCDL 2008 22

Logical data model

Page 23: A Data Model and Architecture for Long-term Preservation

Greg Janée • JCDL 2008 23

Example archival object

Page 24: A Data Model and Architecture for Long-term Preservation

Greg Janée • JCDL 2008 24

Physical data model

...identifier/manifest.xmlcnty24k97.xmldata/

source/cnty24k97.shpcnty24k97.dbf...

cnty24k97.png

• object structure• fixity metadata• inter- and intra-object

relationships

Page 25: A Data Model and Architecture for Long-term Preservation

Greg Janée • JCDL 2008 25

Storage abstraction

• Bitstreams– create, (delete), read, write– no modify

• Directories– create, (delete), list members

• Above identified by hierarchical pathnames

• Satisfied by filesystems, WebDAV, ...

Page 26: A Data Model and Architecture for Long-term Preservation

Greg Janée • JCDL 2008 26

Archive depencies

• Filesystem• XML• Character set(s)• Identifier resolution mechanism(s)

Page 27: A Data Model and Architecture for Long-term Preservation

Greg Janée • JCDL 2008 27

Summary

• Architecture to facilitate handoffs, reduce risk, provide fallback– best practices– interoperability potential

• Ongoing work– “logistical networking” for storage virtualization– preservation profiles for other data models– format registries and other achive depencies– whole-archive descriptor

• dependencies, policies

Page 28: A Data Model and Architecture for Long-term Preservation

Greg Janée • JCDL 2008 28

Questions?