View
215
Download
2
Category
Preview:
Citation preview
Steps Towards a Theory of Information Preservation
Giorgos Flouris, Carlo Meghini
Istituto di Scienza e Tecnologie dell’ Informazione (ISTI)
CNR, Pisa, Italy{flouris,meghini}@isti.cnr.it
Invited Talk(PresDB-07)
23/03/2007 Giorgos Flouris, PresDB-07 2
Introduction
• Preservation:– Very important, difficult and interesting problem– Need for preservation is self-evident
• Notes on this work:– Ongoing work for CASPAR (suggestions welcome)– About digital objects (not about databases, but can be
applied to databases)– The focus of this work is not to perform preservation, but to
describe formally what it means to perform preservation
23/03/2007 Giorgos Flouris, PresDB-07 3
Purpose
We are trying to come up with a formal, mathematical, logic-based description of preservation as a scientific
discipline, to the end of deriving a methodology resting on solid grounds
(then, we will try to apply this methodology to CASPAR)
23/03/2007 Giorgos Flouris, PresDB-07 4
The Need for a Theory of Information Preservation
• Why is such a theory important?– A formal, theoretical, mathematical framework allows the
proof of impossibility and existential results– Allows us to ground existing (and future) methods upon a
common formalism for comparison– Provides a set of formal desirable properties for existing and
future preservation methods– Allows proving that a preservation method works well (or
does not work well)
• Where practitioners believe, a theory can prove
23/03/2007 Giorgos Flouris, PresDB-07 5
Preservation Types
The first letter of the
English alphabet
The first letter of the
English alphabet
PRODUCER CONSUMER
Kno
wle
dge
Leve
lS
ymbo
l Le
vel
Understands Concept
Reads SymbolWrites Symbol
Understands Concept
Reads BitsWrites Bits
KR Level
Information Preservation
Data (or Object) Preservation
Bit Preservation
A
01000001
A
01000001
Time
23/03/2007 Giorgos Flouris, PresDB-07 6
Preservation TypesExample
City Temperature Date
Athens 12 08/03/07
Pisa 11 06/03/07
Edinburgh 8 11/03/07
Bit Preservation: Database is not corrupt (error correction techniques, backups, refreshment of media)
Data Preservation: Database can be opened (preserve format specification)
Information Preservation: Database can be understood (temperatures in Celsius, dates in dd/mm/yy)
23/03/2007 Giorgos Flouris, PresDB-07 7
StaticsDigital Object and UCK
• A digital object depends on external information:– Bit Format (ASCII codes, integer representation, …)– Symbols’ Format (23/03/07 or 03/23/07)– Background Knowledge (what is the meaning of 23/03/07)
• A digital object is attached to a single Underlying Community Knowledge (UCK) that contains this information
• Therefore:– A digital object carries no meaning by itself– Its meaning (semantics) is derived from the attached UCK
23/03/2007 Giorgos Flouris, PresDB-07 9
Information to be Preserved:Questions and Answers
• Digital object: a set of questions and answers– Not all information in a digital object needs to be preserved– Example: a document (content, format, fonts, pagination)
• The exact information to be preserved depends on:– Type of digital object– Producer’s intentions– Digital object’s intended reader (Designated Community)– Legal issues– Practical considerations– …
23/03/2007 Giorgos Flouris, PresDB-07 10
StaticsInformation Preservation Structure (IPS)
• IPS = UCK + Digital Object– UCK: <L,T>– Digital Object: <Q,ans>
• L is further broken down:– L= <LL, V, VI, P, PC, ⊧>
IPS
UCK Digital Object
L T Q ans
LL V VI P PC ⊧
23/03/2007 Giorgos Flouris, PresDB-07 11
IPS and Preservation Models
• Preservation models provide a methodological framework for determining the content of an IPS– OAIS (ISO standard 14721:2003)
• Representation Information (UCK)– Structural Information– Semantic Information
• Preservation Description Information (questions and answers)– Provenance– Reference– Context– Fixity
• Digital object’s content (questions and answers)
23/03/2007 Giorgos Flouris, PresDB-07 13
Preservation and Change
• UCK evolves– If digital objects remained the same, they would be either
unreadable or would carry the wrong meaning
• Thus we need a methodology that will indicate the appropriate changes to all digital objects attached to a UCK, as a function of:– The old digital object– The old UCK (producer’s UCK)– The new UCK (consumer’s UCK)– The UCK evolution specification
23/03/2007 Giorgos Flouris, PresDB-07 14
Belief Change, Ontology Evolution and Information Preservation (1)
• Initial thought: use well-established methods from belief change (belief revision) and ontology evolution
• Not possible, in general:– The UCK may be a logic not supported by the above fields– Changes may affect the logic itself– Changes may be of infinite nature– Input/output may be different
• Example: Roman to Arabic numerals– III 3– IV 4– …
23/03/2007 Giorgos Flouris, PresDB-07 15
Belief Change, Ontology Evolution and Information Preservation (2)
• However, it is possible under some assumptions:– The logic does not change– The logic in UCK is supported– Old UCK and digital object are known, evolution is known– Change can be finitely described using standard models
• Example from astronomy:– Pluto was a Planet– Planet definition changed recently (24/08/06, Prague)– Pluto reclassified as a Dwarf Planet
23/03/2007 Giorgos Flouris, PresDB-07 17
DynamicsSchematically (General Case)
Producer Consumer
Expanded
Various levels of preservation:complete, essential, modulo logical equivalence, indirect, approximate, partial, …
23/03/2007 Giorgos Flouris, PresDB-07 18
DynamicsIPS Evolution Structure (IPSES)
IPS
UCK Digital Object
L T Q ans
LL V VI P PC ⊧
ProducerIPS
UCK Digital Object
L T Q ans
LL V VI P PC ⊧
ExpandedIPS
UCK Digital Object
L T Q ans
LL V VI P PC ⊧
Consumer
⊇
Mapping needs a finite representation: Turing Machines
IPSES’ definition is incomplete
Need a way to compute the green arrow from the information given (old digital object, producer’s UCK, consumer’s UCK, IPSES)
IPSES
23/03/2007 Giorgos Flouris, PresDB-07 19
Putting it All TogetherGeneral Ideas
11/03/078Edinburgh
06/03/0711Pisa
08/03/0712Athens
DateTemperatureCity
11/03/078Edinburgh
06/03/0711Pisa
08/03/0712Athens
DateTemperatureCity
What is preservation?
Preservation is the process of retaining the meaning of a digital object unaltered for readers with different background, software, hardware etc
What are the preservation types?
Bit Preservation Bits are not corrupt
Data PreservationBits’ format is understood/read
Information PreservationInformation is understood
The first letter of the English
Alphabet
The first letter of the English
alphabet
PRODUCER CONSUMER
Kno
wle
dge
Leve
lS
ymb
ol
Leve
l
Understands Concept
Reads SymbolWrites Symbol
Understands Concept
Reads BitsWrites Bits
KR Level
Information Preservation
Data (or Object) Preservation
Bit Preservation
A
01000001
A
01000001
Time
The first letter of the English
Alphabet
The first letter of the English
alphabet
PRODUCERPRODUCER CONSUMERCONSUMER
Kno
wle
dge
Leve
lS
ymb
ol
Leve
l
Understands Concept
Reads SymbolWrites Symbol
Understands Concept
Reads BitsWrites Bits
KR Level
Information Preservation
Data (or Object) Preservation
Bit Preservation
A
01000001
A
01000001
Time
23/03/2007 Giorgos Flouris, PresDB-07 20
Putting it All TogetherStatics
IPS
UCK Digital Object
L T Q ans
LL V VI P PC ⊧
IPS
UCK Digital ObjectUCK Digital Object
L T Q ansL TL T Q ansQ ans
LL V VI P PC ⊧LL V VI P PC ⊧
What is a digital object?
A digital object is a sequence of bits (no meaning)
What gives meaning to a digital object?
The underlying (often implicit) format, knowledge, symbols’ meaning etc, represented by UCK
What should be preserved?
A set of questions and their answers
How do we determine the content of an IPS?
Preservation models can help
23/03/2007 Giorgos Flouris, PresDB-07 21
Putting it All TogetherDynamics (General)
Why is preservation needed?
Underlying knowledge (UCK) evolves; if digital objects remained the same, they would be not understood or be misunderstood
When is preservation achieved?
When digital objects retain their meaning
Can other research fields help?
Belief Revision and Ontology Evolution, but only partially
23/03/2007 Giorgos Flouris, PresDB-07 22
Putting it All TogetherDynamics (IPSES)
How can we describe UCK evolution?
Using an expanded UCK, plus a mapping and a number of correspondences between the UCKs
Is preservation always possible?
No; various levels of preservation
How should digital objects evolve?
Open question; a function of the old digital object, the two UCKs and the UCK evolution information (IPSES)
Producer Consumer
Expanded
IPS
UCK Digital Object
L T Q ans
LL V VI P PC ⊧
ProducerIPS
UCK Digital Object
L T Q ans
LL V VI P PC ⊧
ExpandedIPS
UCK Digital Object
L T Q ans
LL V VI P PC ⊧
Consumer
⊇IPSES
IPS
UCK Digital Object
L T Q ans
LL V VI P PC ⊧
ProducerIPS
UCK Digital Object
L T Q ans
LL V VI P PC ⊧
IPS
UCK Digital ObjectUCK Digital Object
L T Q ansL TL T Q ansQ ans
LL V VI P PC ⊧LL V VI P PC ⊧
ProducerIPS
UCK Digital Object
L T Q ans
LL V VI P PC ⊧
ExpandedIPS
UCK Digital Object
L T Q ans
LL V VI P PC ⊧
IPS
UCK Digital ObjectUCK Digital Object
L T Q ansL TL T Q ansQ ans
LL V VI P PC ⊧LL V VI P PC ⊧
ExpandedIPS
UCK Digital Object
L T Q ans
LL V VI P PC ⊧
ConsumerIPS
UCK Digital Object
L T Q ans
LL V VI P PC ⊧
IPS
UCK Digital ObjectUCK Digital Object
L T Q ansL TL T Q ansQ ans
LL V VI P PC ⊧LL V VI P PC ⊧
Consumer
⊇IPSES
23/03/2007 Giorgos Flouris, PresDB-07 23
Future Work
• Calculate the evolution of the digital object as a function of:– Old digital object– Producer’s UCK– Consumer’s UCK– IPSES (evolution information)
• Ongoing work: refinements might be required• Extensive testing of the theory (real-world examples)• Tie the theory to more useful in practice structures
23/03/2007 Giorgos Flouris, PresDB-07 24
Acknowledgements
This work was carried out during Giorgos Flouris’ tenure of an ERCIM “Alain Bensoussan” Fellowship Programme.
This work was partially supported by the EU project CASPAR (FP6-2005-IST-033572).
23/03/2007 Giorgos Flouris, PresDB-07 26
The last letter of the
English alphabet
Preservation TypesRevisited
The 6th letter of the Greek
Alphabet
PRODUCER CONSUMER
Kno
wle
dge
Leve
lS
ymbo
l Le
vel
Understands Concept
Reads SymbolWrites Symbol
Understands Concept
Reads BitsWrites Bits
KR Level
Information Preservation
Data (or Object) Preservation
Bit Preservation
Z
01011010
Z
01011010
23/03/2007 Giorgos Flouris, PresDB-07 27
Preservation TypesJoke Analogy
• In order to laugh at a joke, you must:– Hear the joke (bit preservation)
The sound waves should reach your ears; if you are in another room, you won’t laugh at the joke
– Understand the joke (data preservation)You should understand the language; if I say a joke in Greek, you won’t laugh at the joke
– Understand the context of the joke (information preservation)You should understand what the joke is about; if I say a joke about the political situation in Greece, you won’t laugh at the joke
23/03/2007 Giorgos Flouris, PresDB-07 28
StaticsUnderlying Community Knowledge (UCK)
• UCK: a logical formalism, plus a logical theory• Because logics are:
– Formal– Able to express knowledge– Suitable to capture question-answering (using inference)– Well-studied, mature, well-established field with rich results– Allow building theories to express background knowledge
• We don’t embrace any particular logic
23/03/2007 Giorgos Flouris, PresDB-07 29
Contents of a UCK
Common Knowledge
Knowledge P1
Knowledge P2
Knowledge P3
Digital Object
Knowledge C1
Knowledge C2
UCK
Producer Intended Consumer
23/03/2007 Giorgos Flouris, PresDB-07 30
DynamicsNotes on IPSES
• IPS Evolution Structure (IPSES):– IPSES = UCK + mapping
• Exact specification of the change (no side-effects)– Usually change is partially specified (has side-effects)– Determining side-effects is orthogonal to preservation
• Change may be infinite (finite representation needed)– Example: Roman and Arabic numerals – Need Turing Machines to represent the mapping
Recommended