DIACHRON Preservation: Evolution Management for Preservation

Embed Size (px)

DESCRIPTION

by Giorgos Flouris (FORTH), presented at the 3rd PRELIDA Consolidation and Dissemination Workshop, Riva, Italy, October, 17, 2014. More information about the workshop at: prelida.eu

Text of DIACHRON Preservation: Evolution Management for Preservation

  • 1. Evolution Management for Preservation PRELIDA Consolidation Workshop 17.10.2014 Giorgos Flouris (FORTH) fgeo@ics.forth.gr

2. Evolution Management Problem Preservation Evolution 3. Change Detection Change detection for evolution management Identifying changes between versions Challenges (in DIACHRON) 1. Diverse data models 2. Dynamic datasets 3. Recoverable versions 4. Changes as first-class citizens 5. Cross-snapshot queries 4. Evolution in DIACHRON Pilot dataset DIACHRON Version1 Pilot dataset DIACHRON Version2 5. Change Types: Motivation What a nave diff will report Add (Rec, diachron:subject, EFO_001927) Add (Rec, diachron:hasRecordAttribute, rAtt1) Add (rAtt1, diachron:predicate, rdfs:subClassOf) Add (rAtt1, diachron:object, ObsoleteClass) What the pilot expects Add_SuperClass (EFO_001927, ObsoleteClass) 6. Change Hierarchy: Low-level (1/3) Low-level changes DIACHRON model, for internal use Fixed: Add, Delete Just additions and deletions of triples Simple set difference 7. Change Hierarchy: Simple (2/3) Pilot terminology: Add_SuperClass Add_Dimension Fixed, pre-defined Comprising of low-level changes Partitioning is perfect Complete and unambiguous 8. Change Hierarchy: Complex (3/3) Pilot terminology: Add_Synonym, Mark_As_Obsolete Totally custom, pilot-specific (defined at run-time) 9. Using Changes for Evolution Management DIACHRON data model contains all versions Detection based on SPARQL queries Provided at deployment time (for simple) Generated at creation time (for complex) Recoverability Allows moving back and forth between versions 10. Representation Requirements Interesting queries Return the simple changes that dataset X underwent between versions V1 and V2 Return the changes that resource X underwent in the first semester of 2014 Give me all resources of type X that underwent change Y Return all countries for which the unemployment rate of their capital city increased at a rate higher than the average increase of the country as a whole, between versions V1 and V2 Access to both the changes and the data is required Changes are first-class citizens Allowing preservation 11. DIACHRON Data Changes Ontology C1 Add_SuperClass V1 V2 asc_p1 asc_p2 Simple_Change Change prov:Activity Data level Schema level EFO_001927 ObsoleteClass old_version new_version diachron:Entity Add_Synonym Complex_Change 12. Conclusion Main DIACHRON message (Linked) data preservation is related to evolution management DIACHRON challenges 1. Diverse data models 2. Dynamic datasets 3. Recoverable versions 4. Changes as first-class citizens 5. Cross-snapshot queries Solutions DIACHRON data model (#1) Appropriate change definition and detection (#2, #3) Changes and data represented at the same level (#4, #5)