98
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IT Research Challenges in Digital Preservation Andreas Rauber Department of Software Technology and Interactive Systems Vienna University of Technology http:// www.ifs.tuwien.ac.at/~andi

IT Research Challenges in Digital Preservation

Embed Size (px)

DESCRIPTION

IT Research Challenges in Digital Preservation. Andreas Rauber Department of Software Technology and Interactive Systems Vienna University of Technology http://www.ifs.tuwien.ac.at/~andi. Overview. Why do we need Digital Preservation? Digital Preservation Projects in Europe - PowerPoint PPT Presentation

Citation preview

Page 1: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

IT Research Challenges in Digital Preservation

Andreas RauberDepartment of Software Technology and

Interactive SystemsVienna University of Technologyhttp://www.ifs.tuwien.ac.at/~andi

Page 2: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Overview

Why do we need Digital Preservation?

Digital Preservation Projects in Europe

IT-oriented Challenges in Digital Preservation

Some Digital Preservation Research at TUWIEN

Conclusions

Page 3: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Why do we need Digital Preservation?

Page 4: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Why do we need Digital Preservation?

Page 5: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Why do we need Digital Preservation?

Digital Objects require specific environment to be accessible :- Files need specific programs- Programs need specific operating systems (-versions)- Operating systems need specific hardware components

SW/HW environment is not stable:- Files cannot be opened anymore- Embedded objects are no longer accessible/linked- Programs won‘t run- Information in digital form is lost

(usually total loss, no degradation) Digital Preservation aims at maintaining digital objects

authentically usable and accessible for long time periods.

Page 6: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Strategies for Digital Preservation

Strategies

(grouped according to Companion Document to UNESCO Charterhttp://unesdoc.unesco.org/images/0013/001300/130071e.pdf)

Investment strategies: - Standardization, Data extraction, Encapsulation, Format limitations

Short-term approaches: - Museum, Backwards-compatibility, Version-migration, Reengineering

Medium- / long-term approaches: - Migration, Viewer, Emulation

Alternative approaches: - Non-digital Approaches, Data-Archeology

No single optimal solution for all objects

Page 7: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Migration

Transformation into different format, continuous or on-demand (Viewer)

+ Wide-spread adoption+ Possibility to compare to un-migrated object+ Immediately accessible- Unintended changes, specifically over sequence of

migrations- Cannot be used for all objects- Requires continuous action to migrate

Page 8: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Emulation

Emulation of hardware or software (operating system, applications)

+ Concept of emulation widely used+ Numerous emulators are available+ Potentially complete preservation of functionality+ Object is rendered identically- Object is rendered identically- Requires detailed documentation of system- Requires knowledge on how to operate current systems in

the future- Complex technology- Emulators must be emulated or migrated themselves- Emulators potentially erroneous/incomplete

Page 9: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Digital Preservation

Affects all domains- Cultural heritage- eGovernment- Primary data: Sensor data, experiment data- Industry: production processes, workflows, monitoring- Medical, Insurance/Banking, - Society: photos, communications

Test:- Trying to repeat / verify “old” experiments- Problems with

• Data Management: original test data, parameters, preprocessing,…

• Code: compilability, change of libraries/functionality• interpretability of results, know-how

Page 10: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Digital Preservation

Is a complex task Requires a concise understanding of the objects, their

intellectual characteristics, the way they were created and used and how they will most likely be used in the future

Requires a continuous commitment to preserve objects to avoid the „digital dark ages“

Requires a solid, trusted infrastructure and workflows to ensure digital objects are not lost

Is essential to maintain electronic publications, research data, … accessible

Will become more complex as digital objects become more complex

Page 11: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Overview

Why do we need Digital Preservation?

Digital Preservation Projects in Europe

IT-oriented Challenges in Digital Preservation

Some Digital Preservation Research at TUWIEN

Page 12: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Overview

Digital Preservation Projects in Europelarge number, small selection provided below- DPE: Digital Preservation Europe, EU, FP6- Caspar: Cultural, Artistic and Scientific Knowledge for

Preservation, Access and Retrieval- Planets: Preservation and Long-term Access Networked Services: - Shaman: Sustaining Heritage Access through Multivalent Archiving- LIWA: Living Web Archives- Keep-it: Kultur, eCrystals, EdShare (and NECTAR) - Preserve It

IT-oriented Challenges in Digital Preservation Some Digital Preservation Research at TUWIEN Conclusions

Page 13: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

DPEFP6 Coordinating Action

http://www.digitalpreservationeurope.eu

Page 14: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Vis

ion

What is DPE?

FP6 Coordinating Action,

Digitalpreservationeurope (DPE) intends to create a coherent platform for proactive cooperation, collaboration, exchange and dissemination of research results and experience in the preservation of digital objects

Digital Preservation: ensuring long-term accessibility of digital objects

Mitigating the risk of a “digital dark age”

http://www.digitalpreservationeurope.eu

Page 15: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Obje

ctiv

es

Two macro objectives:

1. to foster collaboration and synergies among on-going projects and existing initiatives across the ERA [repositories and audit and certification tools]

2. to raise up awareness on digital preservation challenges among different user communities [different level of awareness on the subject and its strategic significance]

Page 16: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Act

ivit

iesDPE Activities

• Range of activities to foster research and take-up in digital preservation

• Research Roadmap

• Digital Preservation Challenge

• Researcher and Practitioner Exchange

• DPE Videos

Page 17: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Rese

arc

h R

oadm

ap

Preservation Research Roadmap

The Roadmap aims at contributing to the planning

of our future R&D in Digital Preservation by

means of different actions:

Analysing the state of the art in Digital Preservation

research and already existing research agendas on

a global level;

Researching the needs and demands from the point

of view of the Digital Preservation user communities

and their leading experts;

Researching the needs and demands of future

markets for technology and service providers

Page 18: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Rese

arc

h R

oadm

ap DPE Recommended Research

Restauration Conservation Collection and repository management Preservation as risk management Preserving the interpretability and

functionality of digital objects Collection cohesion and interoperability Automation in preservation Preserving the context Storage technologies

Page 19: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

DP C

halle

nge

DPE Challenge

• Promotion of innovation in DP• Targeted at students• Main Goal:

Provide access to and make digital objects useable• Open to participants world-wide• Submission deadline: May 30 2008• http://www.digitalpreservationeurope.eu/challenge• Different tasks, eg.

• Assessment of Submission by an International Panel of Experts in the field

• Access Data in a Legacy Client-Server System• Proprietary File Format• Preservation of Multimedia Art

Page 20: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

DPE V

ideos

Raising Awareness of DP Issues Experts & Practitioners:

Briefing Papers, Seminars General Public:

little awareness, everybody afected DPE Videos:

series of short cartoons highlighting DP issuesaimed at non-expertstrying to communicate challenges in simple styleVideos available on YouTube:http://www.youtube.com/user/wepreserve

Page 21: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

CASPARhttp://www.casparpreserves.eu

Page 22: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

CASPAR

How can digital data still be used and understood in the future when systems, software, and everyday knowledge continues to change? This is the CASPAR challenge.

The CASPAR project is mainly based on the OAIS standard ISO:14721:2003

Its Architecture is defined for- Managing key concepts of the OAIS reference model- Supporting main functionality identified in the OAIS

functional model CASPAR aims to define and implement interfaces and

functionally independent components

Page 23: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Users may be unable to understand or use the data e.g. the semantics, format, processes or algorithms involved- How to guarantee digital information may be accessed and

understood in the future?- How to guarantee retrieval of Archival Information?- How to guarantee intelligibility of digital information within

heterogeneous Designated Communities?

Non-maintainability of essential hardware, software or support environment may make the information inaccessible- How to guarantee preservation actors are informed about change

events?- How to guarantee appropriate actions are undertaken to preserve

Archival Information against change events?

Preservation Issue 1

Page 24: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Preservation Issue 3

The chain of evidence may be lost and there may be lack of certainty of provenance or authenticity- How to guarantee an adequate integrity and identity for any Archival

Information?

Access and use restrictions may make it difficult to reuse data, or alternatively may not be respected in future- How to guarantee an adequate security access with the proper

rights to any resource and functionality within an Archive?

The current custodian of the data, whether an organisation or project, may cease to exist at some point in the future- How to guarantee a proper information package management within

and Archive?- How to guarantee long-time preservation maintenance of any

information package?

Page 25: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Planets

http://www.planets-project.eu

Page 26: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

The Planets project

4-year research and technology development project co-funded by the European Union

Addresses core digital preservation challenges Started June 2006 with €15m budget Coordinated by the British Library 16 partners

- national libraries and archives- leading technology companies- research universities

Builds on strong digital archiving and preservation programmes

Page 27: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Planets partners

The British Library National Library, Netherlands Austrian National Library State and University Library,

Denmark Royal Library, Denmark

National Archives, UK Swiss Federal Archives National Archives, Netherlands

Page 28: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Planets partners

Tessella Plc IBM Netherlands Microsoft Research Austrian Research

Centers GmbH

Hatii at University of Glasgow

University of Freiburg Vienna University of

Technology University of Cologne

Page 29: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

The Planets team

All Staff Meeting, February 2007

Page 30: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Planets Architecture

PreservationPlanningServices

CharacterisationServices

PreservationAction

Services

Test Bed:evaluation and

validationservices

Interoperability Framework

Digital Content

OrganisationalContext

ExternalContext

Technical Environment

Page 31: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Preservation Action

Transform content- Pluggable infrastructure for third-party

migration tools

Transform environment- Dioscuri:

Modular emulation of the full hardware/software environment

- Universal Virtual Computer (UVC):provides a layered durable approach to emulation

Preservation Action Tools registry XML language for describing preservation action tools

Page 32: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Preservation Characterisation

Characterisation framework- Unifies tools for identifying file formats

and extracting object properties

Characterisation registry- Based on the file format registry PRONOM

eXtensible CharacterisationLanguages (XCL)- Family of XML languages

for characterising digital objects

Comparator verifies effects of preservation actions

Page 33: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Infrastructure and Testbed

Interoperability Framework providescommon basis- JBoss Application Server- Logging, Security Services- Registry services- User management and Single-Sign-On

Planets Testbed- Controlled environment for the execution of experiments- Accumulated experience base collected in registry

Page 34: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Preservation planning

Collection profiling services

Technology watch services

Risk assessment of digital objects

Preservation planning methodology

Tool support: Plato, the Planning Tool

Page 35: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Preservation planning

Evaluating preservation strategies Variety of solutions and tools exist Each strategy has unique strengths and weaknesses Requirements vary across settings Decision on which solution to adopt is complex Documentation and accountability is essential

Preservation planning assists in decision making Evaluation of strategies on representative sample content

according to specific requirements

Page 36: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

SecuringCommunication withthe future

Research & Development Project in Digital

Preservation

Page 37: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

SHAMAN Objectives

SHAMAN will establish an Open Distributed Resource Management Infrastructure Framework enabling Grid-based Resource Integration, that is firmly grounded in a conceptual and technical reference architecture.

SHAMAN will develop and integrate technologies to support Contextual and Multivalent Archival and Preservation Processes to enable proper preservation management and policies.

SHAMAN will support Managing of Future Requirements by safeguarding Interoperability with Future Environments based on evidence gathered through the characterisation of digital objects, their (metadata) context and their preservation environment, resulting in the evolution of preservation policies.

Page 38: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

SHAMAN will deliver a next-generation Digital Preservation framework, with three prototypical applications.

SHAMAN Outputs

scientific publishing in libraries and documents in governmental archives

digital objects used in industrial design and engineering

data resources used in e-Science applications

Page 39: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

SHAMAN Framework

Page 40: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

SHAMAN Consortium

SHAMAN Collaborators:

Page 41: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

FP7 project funded by the European Commission Started in Feb 2008 EA, L3S, Max Planck, Hungarian accademy of science,

Hanzo Archives, libraries and archives

Page 42: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Users and challenges identified

User type Main concern Locker

National Libraries Size of archives No control of size and its evolution with time with implication on costs control.

Other Libraries Coherence Selecting and keeping appropriate content for their user community is difficult on the web

Institutional Archives

Fidelity Lack of fidelity to the original

TV and radio Archives

Variety of content type Impossibility to archive streaming

Museums Variety of content type Difficulty to archive non-standard formats

Corporate Archives Fidelity Fidelity to the original and temporal coherence for compliance

Researchers Fidelity Difference between original web and what current WA can deliver

End Users Interpretability Impression of getting lost in WA

Page 43: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Technology concerned

Page 44: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Approach

Example: Semantic Evolution Detection Time-Specific Term Contexts

Leningrad@1970 (Soviet Union, Hermitage, Moscow, Neva River, Baltic Sea,…)Saint Petersburg@2009 (Russia, Hermitage, Moscow, Neva River, Baltic Sea,…)

Across-Time Semantic Similarity compares term contexts and shows high similarity between Leningrad@1970 and Saint Petersburg@2009

Term Coherence analyzes term contexts and shows that Saint Petersburg@2009 and Hermitage@2009 are commonly used together

Page 45: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Approach

Good query reformulations contain query terms similar to the original query terms that are commonly used together

Examples Saint Petersburg Museum Leningrad Museum ✔

Leningrad Cowboys Saint Petersburg Cowboys ✖

iPod Hearing Damage Walkman Hearing Damage✔

disabled / handicapped / special needs

Page 46: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

KeepIt--------------------------------------

Kultur, eCrystals, EdShare (and NECTAR) – Preserve It!

Page 47: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Project Overview

Aim: To create a number of exemplar preservation repositories from which others can learn

Small number of very diverse repositories

Training

Development

Deployment

47

Page 48: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Preservation

Long TermReliable Storage

Risk Analysis

Mitigation / Action

Page 49: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Long Term Reliable Storage

Export Plug-insExport Plug-ins

EPrints is expanding the number places in which plug-ins can be utilised.

Import Plug-insImport Plug-ins

EPrints CoreInterfaces, Submission Manager

EPrints CoreInterfaces, Submission Manager

Database ControllerDatabase Controller Storage ControllerStorage Controller

CLOUD (Amazon S3)

CLOUD (Amazon S3)

Page 50: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Overview

Why do we need Digital Preservation?

Digital Preservation Projects in Europe

IT-oriented Digital Preservation Challenges

Some Digital Preservation Research at TUWIEN

Page 51: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

DP Research

Some provocative (?) observations IT R&D frequently suffers from disconnect between

academia and practice- research independent of development- theoretical results that cannot be applied to practice

DP R&D driven strongly by practice- many good and useful results- reactive instead of proactive- results need to be applicable now- lacks creative prospect into problems of the future- lacks acceptability for non-perfect solutions- little real IT research by IT experts

Page 52: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

DP Research

DP research requires several IT sub-disciplines IT research in DP needs to

- build its own research agenda- live in an open-minded environment allowing

(initially) non-perfect solutions- be evaluated following stringent standards of

empirical evidence, validation and benchmarking- needs to be pro-active, foreseeing challenges

of the future- address a broader scope of topics that go beyond

migration/emulation, metadata and data management and similar currently dominant issues

DP an integral issue of all IT systems design

Page 53: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

DP Research

Urgently needed within DP community: Identify IT areas that need to contribute to DP research For each area, come up with the top-5 research questions These research questions should be concrete

- formulated as a research hypothesis- formulated as a PhD topic

How can we get these IT-disciplines involved? How can we get IT researchers motivated?

- e.g. DPE research challenge

Page 54: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

DP Research

Potential areas: Databases:

- split of data and function and its description- PP-aware design and description- modeling data semantics for DP

IT security- secure documents, save formats- Signatures, long-term key management - DRM- long-term non-disclosure

Page 55: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

DP Research

Potential areas: Information Retrieval:

- large-scale indexing and retrieval- evolution of semantics and spelling- modeling forgetting

Ethics- privacy, digital personalities and forgetting- information types and usage + IT support to enforce

Software Engineering- DP as systems engineering- secure workflows and trust- certification of system for DP fitness

Page 56: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

DP Research

Potential areas: Algorithms:

- semantics from code- cross-compilation- support for digital archeology- evolution of the concept of file formats

Storage:- advanced storage technologies- management of large storage systems- hybrid analog/digital storage- self-describing/monitoring storage systems

Page 57: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

DP Research

Potential areas: User interfaces

- Interfaces of the future- How to preserve/communicate interfaces long gone by

Application domains- effect of the quantum computer on DP of conventional

systems- mash-ups and distributed applications- pervasive computing and sensor networks- virtual worlds- threat scenarios in DP- home users

Page 58: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

DP Research

Many further areas – basically: all sub-disciplines of IT affected?

What would be the most challenging research questions in each of these?

How can we get experts in these disciplines get involved with DP research?

How can we make DP research more solid research by IT standards?

Page 59: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Overview

Why do we need Digital Preservation?

Digital Preservation Projects in Europe

IT-oriented Digital Preservation Challenges

Some Digital Preservation Research at TUWIEN

Page 60: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Overview

Some Digital Preservation Research at TUWIEN- Preservation Planning: PLATO- Small Home Office Archiving: HOPPLA- Establishing Context of Digital Information- Evaluating Emulators- Recovering Digital Objects from Audio Wave Form- Preserving Virtual Worlds- Ethical Issues in Web Archiving- Digital Preservation Time Capsule

Conclusions

Page 61: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Why Preservation Planning?

Several preservation strategies developed

- For each strategy: several tools available

- For each tool: several parameter settings available

How do you know which one is most suitable?

What are the needs of your users? Now? In the future?

Which aspects of an object do you want to preserve?

What are the requirements?

How to prove in 10, 20, 50, 100 years, that the decision was correct / acceptable at the time it was made?

Preservation Planning

Page 62: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Preservation Planning

Page 63: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Preservation Planning

Page 64: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Overview

Some Digital Preservation Research at TUWIEN- Preservation Planning: PLATO- Small Home Office Archiving: HOPPLA- Establishing Context of Digital Information- Evaluating Emulators- Recovering Digital Objects from Audio Wave Form- Preserving Virtual Worlds- Ethical Issues in Web Archiving- Digital Preservation Time Capsule

Conclusions

Page 65: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Hoppla

Archiving Solutions for- SME- SOHO- Private Users

No/little expertise Service-oriented concept Similar to Antivirus Software User sends collection profile Experts perform Pres. Planning Rules for Preservation Actions are

provided Combines back-up and migration

HomeOfficePainlessPersistentLong-termArchiving

Page 66: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

HOPPLA Principles

Need for bit-stream and logical object preservation- combine back-up and migration

No expertise on and effort for digital preservation issues- fully automatic solution outsourcing DP expertise,

inspired by current antivirus solutions

Stability and system independence- rely on plain file system storage with redundant XML metadata

Trust and accountability- aim to fulfill core requirements of audit and certification

initiatives

Privacy- data resides with users, control over information sent to server

Page 67: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

HOPPLA Architecture

Page 68: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Overview

Some Digital Preservation Research at TUWIEN- Preservation Planning: PLATO- Small Home Office Archiving: HOPPLA- Establishing Context of Digital Information- Evaluating Emulators- Recovering Digital Objects from Audio Wave Form- Preserving Virtual Worlds- Ethical Issues in Web Archiving- Digital Preservation Time Capsule

Conclusions

Page 69: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Context of Information

Digital information objects are not isolated- Exist in a specific context (to other objects)

Context is important for- Correct interpretation- Establishing authenticity- Ensuring appropriate use

Context is difficult to establish/document Often missing / incomplete / incorrect when manually

entered Automatically extract context of objects

- Establish contextual relations between them, generate new meta-data

Visualisation/interaction tool

Page 70: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Context Dimensions:

Currently establishing context along- Time (creation, modification, ...)

- Type, e.g. MIME types

- Contributors / Social: people involved• Creators, Modifiers, Users

- Content related features• e.g. same images embedded, same keywords

Other types of dimensions possible, e.g. concurrent usage of documents, ...

Applications:

Ingest of donations, disaster recovery, IR

Context of Information

Page 71: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Data Warehouse – Snowflake schema

Context of Information

Page 72: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Context of Information

Page 73: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Overview

Some Digital Preservation Research at TUWIEN- Preservation Planning: PLATO- Small Home Office Archiving: HOPPLA- Establishing Context of Digital Information- Evaluating Emulators- Recovering Digital Objects from Audio Wave Form- Preserving Virtual Worlds- Ethical Issues in Web Archiving- Digital Preservation Time Capsule

Conclusions

Page 74: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Testing if significant properties stay intact It is well known how to extract and compare significant

properties for migrated objects With emulation original object is unchanged,

comparison of a rendered version is necessary Detection of a change in behaviour of object Interactivity has to be considered (applications, video

games, interactive art)

Evaluation of Emulation

Page 75: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Goals Perform repeatable experiments Extract significant properties from the rendering process Automatically compare significant properties extracted

from different emulation environments Allow preservation planning for emulation environments Automate parts of the process of testing emulators

Evaluation of Emulation

Page 76: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Different significant states- target state, series of states, continuous stream

Extracting properties from emulation environment- in characterization language (e.g. XCL)- e.g. cycles, frame rate (average/min/max) number of files/bytes

accessed on I/O devices, event logs, screenshots, video streams- not supported yet by emulators

Deterministic behaviour of object necessary- identify and keep constant causes of non-deterministic behaviour- e.g. user input, hardware timer values, random seed generation

Extracting rendered object from emulation environment- from different levels: system memory, video memory, output device

Evaluation of Emulation

Page 77: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Overview

Some Digital Preservation Research at TUWIEN- Preservation Planning: PLATO- Small Home Office Archiving: HOPPLA- Establishing Context of Digital Information- Evaluating Emulators- Recovering Digital Objects from Audio Wave Form- Preserving Virtual Worlds- Ethical Issues in Web Archiving- Digital Preservation Time Capsule

Conclusions

Page 78: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Original system Philips G7400 from 1983 encodes data in audio streams for recording on audio tapes

Migration Tool to extract the encoded data from the audio stream and migrate to non-obsolete formats

Extracted data: Software, screenshots, text & numeric data Can read data that is unreadable with original system

Recovering Digital Objects from Audio Wave Form

Page 79: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Overview

Some Digital Preservation Research at TUWIEN- Preservation Planning: PLATO- Small Home Office Archiving: HOPPLA- Establishing Context of Digital Information- Evaluating Emulators- Recovering Digital Objects from Audio Wave Form- Preserving Virtual Worlds- Ethical Issues in Web Archiving- Digital Preservation Time Capsule

Conclusions

Page 80: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Alternative strategy: not the objects and world data are extracted but scenes of interaction are recorded

Drone that moves inside Second Life and video records areas with user action

besides technical difficulties ethical and legal issues

Preserving Virtual Worlds

Page 81: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Overview

Some Digital Preservation Research at TUWIEN- Preservation Planning: PLATO- Small Home Office Archiving: HOPPLA- Establishing Context of Digital Information- Evaluating Emulators- Recovering Digital Objects from Audio Wave Form- Preserving Virtual Worlds- Ethical Issues in Web Archiving- Digital Preservation Time Capsule

Conclusions

Page 82: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Ethics & Web Archiving

Web is very volatile Web archiving is an essential activity to ensure valuable

content is being preserved Web Archives contain a wealth of extremely valuable

information

But:

Currently most archives are closed to public Mostly due to legal reasons Need a legal solution

Is this all?

Page 83: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

What should such a legal solution look like? Is it only a legal problem?

There are things that are legal, but ethically dubious (There are things that are illegal, but ethically acceptable)

Privacy is an essential good Most societies are increasingly privacy-aware Are there ethical concerns, and if so

- Are we aware of them?- Can we do something to address them?

Ethics & Web Archiving

Page 84: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Assumptions and a number of questions:

The Web is a new publication medium – Is it?

The ephemeral nature of Web pages is a “design fault” -

Is it?

A Web Archive is merely a collection of publicly available information – Is it?

Ethics & Web Archiving

Page 85: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Assumptions underlying Web Archiving:

The Web is a new publication medium?- Are people “publishing”

(conscious decision, effort invested,…)

- If so, are they aware of it?

- Are kids allowed to publish?

- Which parts of the Web are publishing, which are communication?(ako chatting-in-the-bus?)

- Do we have a choice of NOT putting some things on the Web?

Ethics & Web Archiving

Page 86: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Assumptions underlying Web Archiving

The ephemeral nature of Web pages is a “design fault”?- Post-it notes are based on a “faulty” glue

-> should we put real glue onto them?- If the Web is a publication medium: may there be some who use it

as such BECAUSE it is ephemeral?(art, temporary announcements, CV, …)

- Does being ephemeral make it more a communication medium in the perception of some people?

- Does society need en ephemeral way of communicating with larger communities in an ephemeral manner? (speaker’s corner, graffitti, …)

Ethics & Web Archiving

Page 87: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Assumptions underlying Web Archiving:

A Web Archive is merely a collection of publicly available information- True, but what about Holism?

(The whole is more than the sum of it’s parts)- Does the ease of use, or the new possibilities of use, change the

nature of an information collection?(full-text search, semantic analysis, IR as opposed to conventional archive catalogs)

- Specialized person profile search engines, used by HR departments(special profile generation services to counter-act this)

- Technical possibilities will increase in the future(video analysis, semantic analysis, reasoning, …)

Ethics & Web Archiving

Page 88: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Research Issues: What are the ethical constraints, and how they can be

more precisely defined or formalized, Which approaches users of Web archives with potentially

dubious intentions might employ to obtain information that should not be provided by privacy-respecting archives,

In how far technological solutions such as query analysis, machine learning and data mining can help in identifying potentially harmful queries, potentially incriminating content on Web pages, information worth of protection, or combinations thereof,

How legal regulations might be formulated in order to allow (partial) access to Web archive content in a save, ethically correct, and useful manner

Ethics & Web Archiving

Page 89: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Overview

Some Digital Preservation Research at TUWIEN- Preservation Planning: PLATO- Small Home Office Archiving: HOPPLA- Establishing Context of Digital Information- Evaluating Emulators- Recovering Digital Objects from Audio Wave Form- Preserving Virtual Worlds- Ethical Issues in Web Archiving- Digital Preservation Time Capsule

Conclusions

Page 90: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Digital Preservation Time Capsule

Digital Preservation suffers from a lack of public awareness solid understanding of the levels of complexity being abstract / intangible failing to graps people’s imagination seeming to be rather simple (only storage?)

even among some experts

Page 91: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

The Planets TimeCapsule is a scientifically solid & visually appealing showcase

demonstrating general DP challenges & Planets solutions is a tangible and exciting demo showing the level of

complexity & the amount of information involved in preserving a few selected objects

aims at capturing the public’s and experts imagination, benefitting from a leveraging effect by involving media

Constitutes a lasting legacy for Planets May serve a basis for training, exhibitions and future

research

Digital Preservation Time Capsule

Page 92: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

The Planets TimeCapsule is inspired by

Voyager Golden Record Rosetta Stone Long Now Rosetta Project Clock for the Long Now

and other initiatives aimed at

making long-term thinking

graspable

Digital Preservation Time Capsule

Page 93: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Pick a set of Source Objects Describe them with PC-tools and PREMIS metadata Add representation information

- file format standards & documentation- programming language definitions, compiler info

(also for secondary objects) Add viewer (binary + source + OS + PREMIS + PC) Migrate them to more stable formats

- PA tools: description + source (+ PREMIS + PC)- PP: plan and evaluation of loss (+ PREMIS + PC)

Store them on different data carriers- Carrier description- Device description- File system description

Digital Preservation Time Capsule

Page 94: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Overview

Why do we need Digital Preservation?

Digital Preservation Projects in Europe

IT-oriented Digital Preservation Challenges

Some Digital Preservation Research at TUWIEN

Page 95: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Digital Preservation is an important issue Affects everybody and in all domains

- cultural heritage, industry, science, society at large

Significant research & development efforts Number of solid solutions Number of challenging open research issues Need to involve core IT experts from different domains Need to change perspective on DP research:

- from ex-post to pro-active- from external system to integrated part of all IT system design

Summary

Page 96: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Page 97: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Paper, Tutorial, Panel & Workshop Submission 5 May 2010

Notification of Acceptance 18 Jun 2010

Submission of Final Versions 11 Jul 2010

iPRES 02010 September 19-25 2010

http://www.ifs.tuwien.ac.at/dp/ipres2010

iPRES 02010 Dates

Page 98: IT Research Challenges in  Digital Preservation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

Thank you!

http://www.ifs.tuwien.ac.at/dp