47
“Hot Topics” in Long-term Preservation of Digital Objects Borje Justrell National Archives of Sweden

C4 borje justrell_hottopicslongtermpreservation

Embed Size (px)

Citation preview

Page 1: C4 borje justrell_hottopicslongtermpreservation

“Hot Topics” in Long-term Preservation of Digital Objects

Borje JustrellNational Archives of Sweden

Page 2: C4 borje justrell_hottopicslongtermpreservation

Aim of the SessionThis session will focus on some topics in

long-term digital preservation that are “hot” today at the Swedish National Archives.

The perspective is Swedish, but the intention is that the chosen topics will serve as examples on the discussion in the European archival community.

Page 3: C4 borje justrell_hottopicslongtermpreservation

Programme10.30 Introduction

- The Swedish archival framework

- Digital preservation – definitions and trends

11.00 Chosen topics

- Open data and the role of National Archives

- Transfer of electronic records

- Building a digital archive

12.00 End

Page 4: C4 borje justrell_hottopicslongtermpreservation

The Swedish Archival FrameworkSweden, officially the Kingdom of Sweden, is a

Scandinavian country in Northern Europe. At 450,295 square kilometres (173,860 sq mi), Sweden is the third-largest country in the European Union by area. With a total population of over 9.9 million, Sweden consequently has a low population density of 21 inhabitants per square kilometre (54/sq mi), with the highest concentration in the southern half of the country. Approximately 85% of the population lives in urban areas

Page 5: C4 borje justrell_hottopicslongtermpreservation

The Swedish Archival Framework

Some basic facts:

- Freedom of the Press Act (1766) which is part of the Swedish constitution. The Archives Act is based on it

- Public records

- Principle of openness and public access

- Record: could be textual or image based – or a data file or something else that can be read and understood only by using technical means

Page 6: C4 borje justrell_hottopicslongtermpreservation

Laws and regulations affecting the work

Page 7: C4 borje justrell_hottopicslongtermpreservation

The Swedish Archival FrameworkOrganisation:

- One archival institution for state archives (the National Archives) appearing at 18 locations around the country, administrating in total 13 physical reading rooms and 1 digital “reading room” on the Internet. About 500 employees.

- All municipalities (290 primary ones and 20 secondary ones) are to some extent independent and “performers” in accordance with Swedish laws and state regulations and also responsible for their own archiving (under the Freedom of the Press Act).

Page 8: C4 borje justrell_hottopicslongtermpreservation

Digtial Preservation - DefinitionsA major difficulty in digital preservation is the lack of a precise and

definitive taxonomy of terms. Different communities use the same terms in different ways. Therefore, definitions used in this session may not necessarily achieve widespread consensus among the wide ranging of cultural heritage institutions.

In European calls for R&D project it is often said, that preservation is on hand when digital objects are accessible and usable to future users.

Preservation is NOT concerned only with sustaining single digital objects. Digital objects should be preserved in context which makes them understandable and (consequently) usable.

Page 9: C4 borje justrell_hottopicslongtermpreservation

Digital Preservation - Definitions

Digital objects

Range from relatively simple, text-based files (e.g. word processing files), to highly sophisticated web-based resources which fully exploit the benefits of technology by combining sound with images, the ability to link to other resources, and the ability to interrogate.

Include born digital objects, which are not intended to have an analogue equivalent, either as the originating source or as a result of conversion to analogue form (print out).

Page 10: C4 borje justrell_hottopicslongtermpreservation

Digital Preservation - Definitions

Digital archiving

This term is used very differently within sectors. The library and archiving communities often use it interchangeably with digital preservation.

Computing professionals tend sometimes to use digital archiving to mean the process of backup and on-going maintenance (including storage) as opposed to strategies for long-term digital preservation.

Page 11: C4 borje justrell_hottopicslongtermpreservation

Digital Preservation - Definitions

Digital curation

Digital curation is often used in parallel with digital preservation; it has wider coverage and involves “maintaining, preserving and adding value to digital data throughout its life-cycle”.

http://www.dcc.ac.uk/digital-curation/what-digital-curation

Page 12: C4 borje justrell_hottopicslongtermpreservation

Digital Preservation - Definitions

Digitisation

The process of creating digital files by scanning or otherwise converting analogue materials.

The resulting digital copy, or digital surrogate, could then be classed as a digital object to sustain and consequently subject to the same broad challenges involved in preserving accessibility and usability to it, as "born digital" materials.

Page 13: C4 borje justrell_hottopicslongtermpreservation

Digital Preservation - DefinitionsAuthenticity

Confidence in the authenticity of digital materials over time is particularly crucial owing to the ease with which alterations can be made

In the case of electronic records, authenticity refers to the trustworthiness of the electronic record as a record.

In the case of "born digital" and digitised materials, it refers to the fact that whatever is being cited is the same as it was when it was first created unless the accompanying metadata indicates any changes.

Page 14: C4 borje justrell_hottopicslongtermpreservation

Digital Preservation - Trends

Page 15: C4 borje justrell_hottopicslongtermpreservation

Digital Preservation - Trends

Page 16: C4 borje justrell_hottopicslongtermpreservation
Page 17: C4 borje justrell_hottopicslongtermpreservation

Open Data and the Role of National Archives

Open data in its broader meaning is data freely available to everyone to use and republish as they wish, without restrictions from any mechanisms of control including copyright and patents.

However, an internationally accepted (formal) definition is still lacking. Discussions have started about the need for standardisation, unclear of what.

.

Page 18: C4 borje justrell_hottopicslongtermpreservation

Open Data and the Role of National Archives

In computing, linked data is a method of publishing structured data so that it can be interlinked and become more useful through semantic queries. To create linked open data means that data are not only open but also published in a machin-readable format and linked to other sources of data.

The diagram on next slide shows which linking open datasets are connected, as of August 2014. This was produced by the Linked Open Data Cloud project, which started in 2007. Some sets may include copyrighted data which is freely available

Page 19: C4 borje justrell_hottopicslongtermpreservation
Page 20: C4 borje justrell_hottopicslongtermpreservation

Open Data and the Role of National Archives

Open data is often recognised as a method to achieve a higher degree of transparency in governmental management and decision-making. In EU, open data – government initiatives are built on the union’s directive for Public Service Information (PSI) which is implemented in the legislation of the Member States.

But – all PSI-data are not open, and all open data are not necessarily open public data.

Page 21: C4 borje justrell_hottopicslongtermpreservation

Open Data and the Role of National Archives

Open data

Public data (PSI data)

Open public data

Page 22: C4 borje justrell_hottopicslongtermpreservation

Open Data and the Role of National Archives

In Sweden, the National Archives has this year got a special assignment from the government to foster and coordinate state agencies efforts to make their data available for wider use.

Page 23: C4 borje justrell_hottopicslongtermpreservation

Open Data and the Role of National Archives

The National archives shall, according to the governments decision, mainly

- collect and publish digital information that state agancies have to make public in accordance with the Swedish law on reuse of public records

- stimulate state agencies to publish open data

- administrate and maintain the web portal for open data (already existing)

- support citizens in finding public data and helping them in contacting the agencies who are managing these data

Page 24: C4 borje justrell_hottopicslongtermpreservation

Open Data and the Role of National Archives

This is a assignment for three years. After this period the outcome will be evaluated.

The reasons behind the Governments decision are clearly stated:

It should be easy for citizens and companies to find the state agencies information. However, the agencies need support to make their information accessible in a uniform and cost-effective way.

Page 25: C4 borje justrell_hottopicslongtermpreservation

Open Data and the Role of National Archives

But what about other types of open data than PSI data?

Still under discussion. Most obviosly: Use the assignment as a stepping stone for a strategy on open data and linked open data.

A special secretariat at the National Archives has for some years looked into the challenges and opportunities in linked open data (incl metadata standards tools for mapping metadata between formats and standards)

Page 26: C4 borje justrell_hottopicslongtermpreservation

Building up a digital archive

Page 27: C4 borje justrell_hottopicslongtermpreservation

Conditions in the Beginning of the 21st Century

• No fixed transfer time; data files received from state agencies can be new or old ones.

• Transfers are negotiated between the agencies and the National Archives. Funding is remitted from the agencies to the National Archives to cover the preservation costs.

• When agencies are closed down, their archives are (by law) transferred to the National Archives

• No common E-Archiving standard and Records Management standard in use; agencies implement their own (incompatible) solutions, developed by commercial software vendors.

Page 28: C4 borje justrell_hottopicslongtermpreservation

Regulations for Digital PreservationThe National Archives issues regulations for digital preservation

in the Swedish agencies (under the Archives Act)

Accepted file formats (media dependent rules)– Text files (ISO 8859-1, Unicode)– HTML– XML (also GML and SGML)– PDF (PDF/A-1)– JPEG, TIFF and PNG– MPEG

Page 29: C4 borje justrell_hottopicslongtermpreservation

Digitisation activitiesIn-house scanning of documents; primarily church records, at the

National Archives large scale digitising facility MKC

In-house scanning of documents at the National Archives different locations, further processed at MKC or SVAR (the digital reading room)

In-house microfilm scanning at SVAR

Microfilm scanning by FamilySearch in Salt Lake City to be delivered to SVAR; primarily church records and judicial records.

Page 30: C4 borje justrell_hottopicslongtermpreservation

Long-term Digital Storage at the National Archives (2016-11-01)

• Born-digital files from agencies: about 5 TB

• Audio-video files and multimedia: about 100 TB

• Digitised volumes (one AIP per volume): 466 225

• Digitised images (TIFF-format): 2473 TB– Images in total: 179 million– Images published on Internet: 98 million

• DJVU-files (presentation format): about 30 TB

• Total storage: About 5 PB on tape. (All files are stored on two tapes)

Page 31: C4 borje justrell_hottopicslongtermpreservation

Attributes of a Trusted Digital Repository (OCLC 2002)

• Compliance with the Reference Model for an Open Archival Information System (OAIS)

• Administrative responsibility

• Organisational viability

• Financial sustainability

• Technological and procedural suitability

• System security

• Procedural accountability

Page 32: C4 borje justrell_hottopicslongtermpreservation

The OAIS model

An OAIS compliant archive is built on six functional parts

• Ingest

• Archival Storage.

• Data Management

• Administration.

• Access

• Preservation Planning

Page 33: C4 borje justrell_hottopicslongtermpreservation

OAIS model

Page 34: C4 borje justrell_hottopicslongtermpreservation

The National Archives Platform for Digital Preservation (RADAR)

ESSArch

Archival Storage System

Allmänhet

Sökning via NAD

och SVAR:s webbplats

Ingest from scanning

RALFApplication for control/prepar

ation at the agencies

KRAMApplication for

Ingest and control

ARKIS

Archival Information System

Tjänsteman

Myndighet

Tjänsteman

Riksarkivet

KRAMAccess and

dissemination of databases

Tjänsteman

Riksarkivet

Tjänsteman

Riksarkivet

Page 35: C4 borje justrell_hottopicslongtermpreservation

The Archival Storage System (ESSArch)

• ESSArch is a back-end system for archival storage

• Storage and retrieval of AIP:s. Stores AIP:s in several bitwise identical copies

• AIP:s (contain data files and metadata in METS/PREMIS-format) are stored in TAR-format. No vendor specific backup format

• Reads and writes checksums for packages and files

• Event log and access control

• Local MySQL-database using the PREMIS 2.0. data model

• Automatic updates to the Archival Information System ARKIS

• ESSArch is an open source system based on Linux, Apache, MySQL och Python. ESSArch (version 2.1.0) is available at SourceForge (http://sourceforge.net/projects/essarch/ )

• Used by the National Archives in Sweden and Norway

Page 36: C4 borje justrell_hottopicslongtermpreservation

General Archival Standards

• ISAD(G) and ISAAR(CPF)– The Archival Information System ARKIS is modelled after these

standards

• EAD (Encoded Archival Description), XML-format for archival descriptions. and EAC-CPF (Encoded Archival Context) .XML-format for the description of archive creators– These formats are used as exchange formats for archival

description information– Supported by several commercial archival information systems– Import and export functions in ARKIS– Currently a new Swedish EAD and EAC-CPF specification is

being developed

Page 37: C4 borje justrell_hottopicslongtermpreservation

Metadata standards for digital preservation

METS (Metadata Encoding & Transmission Standard) - Structure for encoding descriptive, administrative, and structural metadata (DLF/LOC) (2004)

PREMIS (Preservation Metadata) - A data dictionary and supporting XML schemas for core preservation metadata needed to support the long-term preservation of digital materials (OCLC/LOC) (2005)

MIX (NISO Metadata for Images in XML) - XML schema for encoding technical data elements required to manage digital image collections (ANSI/NISO) (2006)

Page 38: C4 borje justrell_hottopicslongtermpreservation

Other formats

ADDML (Archival Data Description Markup Language)

XML-format used by the National Archives of Norway and Sweden, XML-format for describing flat files exported from databases (2001, 2009).

An alternative to the Swiss SIARD-format for databases

Page 39: C4 borje justrell_hottopicslongtermpreservation

Transfer of Data from State Agencies

Page 40: C4 borje justrell_hottopicslongtermpreservation

E-archive project

To strengthen the development of eGovernment and create good opportunities for inter-agency coordination, a delegation for eGovernment was established by the Government. This delegation initiates strategic e-government projects, one about e-archive.

This project was headed by the National Archives but in fact a joint effort involving several other governmental agencies as well as county councils and municipalities

The goal: to build a foundation for the implementation of cost effective systems based on common specifications as opposed to isolated and incompatible systems for each agency (government, county council or municipality).

Page 41: C4 borje justrell_hottopicslongtermpreservation

E-archive projectThe first step: to create common specifications (CM)

for exchange formats and thus create interoperability for the development of compatible E-Archive and Record Management systems. In these specifications national adaptations of several international standards will be used such as EAD, EAC-CPF, PREMIS, METS, MoReq and others.

The Project finished in 2014

A maintenance organisation for the common specifications has now been built up

Page 42: C4 borje justrell_hottopicslongtermpreservation

System for long-term information retrieval

E-Archive runned by an agency

(In house or as an e-service provided by an another agency or a commercial company)

General public

Long-termE-Archive at an archival institution such as the National Archives

Other agencysystem

Other agencysystem

Record management

system

Search facilities

Agency employees

Agency

employees

Transfer of electronic records from the business systems to

the E-Archive

Transfer of custody of the electronic records from the agency to the an

arhival institution

Page 43: C4 borje justrell_hottopicslongtermpreservation

Sub-project: Metadata for E-Archiving • Developing a Swedish SIP based on standards such as

METS and PREMIS

• For use in agencies as well as archival institutions– Not only for delivery to the National Archives– Ensure compatibility between different solutions and E-

Archive implementations– Generic structure: the SIP should be possible to adapt

to different information types with basic metadata common to all information types

Page 44: C4 borje justrell_hottopicslongtermpreservation

Subproject: Metadata for E-ArchivingCurent status

• Developing a Swedish SIP– An official specification for a common SIP

Package structure has been published in August 2015

• Content type specification– A common content type specification (CM) for

ERMS-systems is currently being developed

Page 45: C4 borje justrell_hottopicslongtermpreservation

Generic Package Structure for E-Archives

SIP Packagestructure

Content type Specification

ERMS-systems

Content typespecification

other type

Content type specificationother type…

Modified specification

Page 46: C4 borje justrell_hottopicslongtermpreservation

Information Model of Packageshttp://www.loc.gov/standards/mets/

From: Karin Bredenberg

Page 47: C4 borje justrell_hottopicslongtermpreservation

Thank You!