8

Click here to load reader

Digital Preservation and Access in the PrestoPRIME Project

  • Upload
    vuthu

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Digital Preservation and Access in the PrestoPRIME Project

100 Million Hours of Audiovisual Content: DigitalPreservation and Access in the PrestoPRIME Project

Matthew AddisIT Innovation

2 Venture RoadSouthampton, UK

[email protected]

Walter AllasiaEURIX R&D

Via Carcano, 26Torino, Italy

[email protected]

Werner BailerJOANNEUM RESEARCH

Steyrergasse 17Graz, Austria

[email protected]

Laurent BochRAI CRIT

Corso Giambone, 68Torino, Italy

[email protected]

Francesco GalloEURIX R&D

Via Carcano, 26Torino, Italy

[email protected]

Richard WrightBBC R&D

56 Wood LaneLondon, UK

[email protected]

ABSTRACTWe report the preliminary results of PrestoPRIME, an EUFP7 integrated project, including audiovisual (AV) archives,academics and industrial partners, focused on long-term dig-ital preservation of AV media objects and on improving ac-cess by integrating media archives with European on-linedigital libraries, specifically Europeana. Project outcomeswill result in tools and services to ensure the permanence ofdigital AV content in archives, libraries, museums and othercollections, enabling long-term future access in dynamicallychanging contexts. PrestoPRIME has a special focus ondigital preservation in broadcast environments, where verylarge files of digital video must be preserved at high qual-ity (suitable for future re-use in an AV production envi-ronment) in affordable distributed and federated archives.The adoption of standard solutions for digital preservationprocesses (metadata representation, content storage, digi-tal rights government, search and access) enables the in-teroperability of the proposed preservation framework andguidelines. The OAIS model was chosen for the referencearchitecture, METS is adopted as wrapper for metadatarepresentation, while relevant standards (e.g. from W3C,ISO/IEC and others) are used for content and rights de-scription. Project outcomes will be delivered through a Eu-ropean networked Competence Centre, to gather knowledgeand deliver advanced digital preservation advice and servicesin conjunction with Europeana and other initiatives.

Categories and Subject DescriptorsH.3.7 [Information Storage And Retrieval]: Digital Li-braries; K.4.1 [Computers And Society]: Public PolicyIssues—Intellectual Property Rights; H.5.1 [Information

Interfaces And Presentation]: Multimedia InformationSystems—video

General TermsAlgorithms, Design, Standardisation

KeywordsDigital Preservation, Long-term Preservation Framework,Broadcast, Audiovisual archives

1. INTRODUCTIONThere are millions of hours of content in collections of ded-icated AV archives, other archives, libraries and museums.The responsible staff has decades of experience in dealingwith physical carriers: visible, held in the hand, stored onshelves, and played by dedicated devices. Handling AV con-tent is currently undergoing a dramatic change as contentis being migrated to files: invisible, stored “in the cloud”and played by unknown technology at a remote machine.How can these collections be efficiently managed, so thatfiles don’t get lost (metadata, file and storage management),do keep their correct relationships with other files (prove-nance), do maintain their technical (quality), legal (rights)and archival (quality, provenance, rights, metadata) integrity?How can AV content be preserved forever (without goingbroken almost immediately)? And finally, this all has to bedone at lowest possible cost. The purpose of PrestoPRIMEis to supply answers, and tools that implement those an-swers.

The paper is organised as in the following. The context isfully presented in Section 2, followed by a description of thepreservation framework in Section 3. More details on thearchitecture and technologies are given in Section 4. Sec-tion 5 is specifically about the issue of AV rights manage-ment, while Section 6 focus on the consideration on stan-dards and best practices and how this is related with thePrestoPRIME work.

2. BACKGROUNDIn this Section we first present an overview of the objec-tives and tasks of PrestoPRIME and a brief description ofthe covered scenarios about institutions and organisationsas holders of AV collections; afterwards a description of rel-evant features of AV content to be preserved and the impactin case of failure or omission is provided. Finally the fur-ther complications for archive management coming from thedynamic AV content lifecycle are recalled.

2.1 Overview of the PrestoPRIME projectThe PrestoPRIME project addresses the objective of keep-ing AV contents alive within the digital domain. The most

Page 2: Digital Preservation and Access in the PrestoPRIME Project

prominent requirement is to provide means for implement-ing a long-term digital preservation of AV files. Once thepreservation is ensured, it is necessary to guarantee to fu-ture users the access to archived items for simple fruition orfor re-use as part of new multimedia products. The futureaccess is the final goal of today’s preservation activities.

The digital preservation process for AV material is expen-sive (compared to the preservation of other content types)and each action performed on the content can affect irrevo-cably the future access to the content. In order to optimisedigital preservation process, a crucial task for PrestoPRIMEis the identification of the possible models and strategies tobe adopted, according to various contexts and conditions.For each envisageable strategy, the appropriate set of toolsare considered, developed, and tested. In addition to thetraditional methods based on file format migration, Presto-PRIME is also investigating a multivalent approach [20].Other important topics for PrestoPRIME are also qualityassessment and storage management.

Regarding the access and the metadata, the descriptive in-formation must be kept up-to-date with respects to chang-ing contexts, both by making interoperable the various an-notation schemes adopted along time and by involving theusers in a continuous process of enrichment. The ability totrack the re-use of material, which specific tools, will sup-port various preservation activities, such as identification ofthe best-quality copy and the solution of some exploitationrights dilemmas. The ability to handle rights is requiredfor optimising the use of archival items, through which thepreservation process is partially funded. This area, oftenneglected, is also addressed by the project.

PrestoPRIME will not deliver only a set of tools and tech-nologies, but also a complete architecture and framework forintegrating all the components, after a process of assessmentand validation. The same architecture will be adoptable forboth open-source implementations and commercial systems.The results achieved by the project will also exploit the out-comes of other projects and initiatives relevant to the digitalpreservation, as described in the following. It is worthwhilementioning that the OAIS model [6] has been chosen as thereference for the architecture design and in the following weassume that the reader is familiar with OAIS terminologyand main concepts.

2.2 Scenarios regarding content holdersThe content holders addressed by PrestoPRIME are gener-ally those holding AV collections. The most relevant sce-narios which have been identified are related to broadcastarchives, Higher Education Institutions and small archives,as described in the following.

Broadcast archives, such as BBC, ORF, RAI, B&G and INA(which are all partners of PrestoPRIME), own large collec-tions of AV material in SDTV and HDTV formats. Theirbusiness model includes the roles of content creators, distrib-utors (to the users) and archives. The archive is a relevantpossible source of materials for new productions and pub-lications, where archival items are expected to be used inthe short, medium and long term period. Higher EducationInstitutions, such as Universities or other research centers,

are not strictly focused on AV contents, but the AV contentis more and more related with their activities. The AV col-lections are often very large and the need to deal with AVpreservation requires the acquisition of new technical skills.Small archives experience many problems in reaching thecritical size for setting up an affordable digital archive pro-cess for their assets, therefore they have to rely on servicesprovided by other organisations.

The content holders mentioned above require different ap-proaches to the digital preservation. PrestoPRIME devotedmuch effort to the dissemination of its results, organisingpublic events and collecting information also with onlinesurveys, in order to identify which are the most relevant is-sues and requirements that such content holders have to facewith. The results produced by the project will be publiclyavailable as soon as they are finalised, in the form of doc-uments, guidelines and software tools. The aim of Presto-PRIME is also to create a European Competence Centrewhich will continue beyond the project lifetime and will pro-vide the outcomes of the project, creating a network for allactors involved in the digital preservation.

2.3 AV content in PrestoPRIME contextPrestoPRIME aims to deal with all kinds of AV content, in-cluding content originating from specific AV archives (suchas broadcast or film archives) as well as AV holdings of li-braries and museums. The preservation platform providedby the project (see Section 4) will be accessible by these usersand also by other software systems such as other OAIS com-pliant archives and service providers. An important issues isalso the interoperability with existing initiatives from in thecultural heritage community, most prominently Europeana.

Concerning content access, PrestoPRIME adopted a stan-dard approach based on the OAI PMH protocol and MPEGQuery Format (MPQF) [7], in order to enable the query-by-content paradigm, which has been experimented by Presto-PRIME partners in previous EU projects, such as SAPIR1.

Broadcast archives have millions hours of AV material (al-most not yet digitised) and the amount of content is growingvery fastly. Content size depends on compression schemesand related bit rate. This must be considered in relationwith the expected quality level. Master quality level is goodfor production purposes and any lower quality can be derivedfrom a master level copy. Reference values for the contentsize can be given in tens of GB per hour (for example SDTVproduction is typically at 30GB/hour, HDTV premium pro-duction may need 200GB/hour). Exploitation quality level isgood for current commercial purposes and simple productionprocesses, such as news programmes. Archives are expectedto work for dissemination at the exploitation level, wheresize is much lower than for master (ratio can be even 1:10).An other level is the Broadcast/Publication quality, whichdepends on the publication medium and whose size is lowerthan exploitation level. Finally the Browsing quality level isenough to search, identify and select the content, since thesize is negligible compared to the master level, but it is notuseful for user consumption.

1http://www.sapir.eu

Page 3: Digital Preservation and Access in the PrestoPRIME Project

The production quality (master quality level), when recordedon digital AV-media files, requires a large amount of stor-age capacity which can be limited only by the adoption ofcomplex compression techniques, which might impact on thequality of the material (lossy compression) or not (losslesscompression). Typical values for uncompressed video, forinstance, require about 80 GB per hour at standard defini-tion, compared to 30 GB per hour required for a very goodcompression format. The migration of AV content from oneformat to an other, when dealing with uncompressed videos,is a critical issue. In the long term, even guaranteeing accessto AV archive material in a compressed format will only bepossible if the required tools, decoders and players, will stillbe available together with the technical environment whichthey require for running correctly. PrestoPRIME is deeplyinvestigating these problems, evaluating several approaches,such as the multivalent approach, which has been experi-mented in other EU projects, such as SHAMAN [20].

The metadata models used in AV archives focus on bothtechnical and descriptive metadata. The choice of the meta-data model often is not determined by the type of con-tent being described, but by the collection holding it, e.g.a broadcast archive will typically use a format designed forAV content, such as EBU P Meta [15], while AV contentheld by a library or museum is often treated like other typesof items in the collection. While the models coming fromthe domain of AV archives are in most cases ready to holdalso detailed descriptive metadata, they lack in most casessupport for preservation metadata (as can be found in stan-dards from the library community, such as PREMIS [16])and provenance information.

Preserving digital AV content means keeping it accessibleand usable for a long time, including the metadata describ-ing it. This requires migrating metadata to models cur-rently in use and enabling metadata mapping to formatsused in the systems which need to retrieve, deliver, displayand process archived content. Many current approaches forpublishing and mapping metadata are highly by mappingto models that represent least common denominators of themodels used in the related collections. New ways of deal-ing with metadata, using approaches based on emerging Se-mantic Web technologies and Linked Data could overcomethese limitations. Technical and descriptive metadata mustinclude also information related to the preservation planwithin the archive. In addition users can add annotations tothe content available in the system and user generated meta-data are fed back and enrich the existing documentation ofthe content. This field is also under study in PrestoPRIME,an example for such a user annotation tool is the Waisda?tagging game2 developed by project partners.

Finally, AV content is characterised by the time dimension.Playing AV is a presentation of content along its timeline andalso editing, seeking a specific point, describing and indexingsegments and in general any action or process on AV mate-rial is affected by the need to manage the timeline properly.PrestoPRIME will provide a solution for the representationof temporal information based on relevant standards, such asW3C Media Fragments [25] or MPEG-21 DIDL [5]. In par-

2http://www.waisda.nl/

ticular, PrestoPRIME investigates the relevance of temporaldecomposition related to digital rights, where each segmentsof a given content can be associated to a different license.

2.4 The impact of content lossThe value of lost content is often difficult to estimate. TheAV content records the memory of time and places, the voiceand the images of people, but content created in the pastcannot be re-produced again, unless in fiction terms. The pe-culiarity of the AV document is also relevant, because theremay be lots of copies or versions or other contents whichmay replace the lost material or maybe not. Additionallythe production of new content is generally more expensivethan re-using pre-existing material. If appropriate contentis not found in the archive, there is a clear impact on theproduction costs. The proliferation of means of publicationof multimedia is economically possible only because of reuseof archive material.

Technical quality is also a value. If the highest quality ma-terial is lost and only a lower quality copy is available, itcan sometimes be enough, but the future fruition of poorquality material will have great impact, especially for theprofessional uses. The presentation technologies, i.e. screensand loudspeakers, are going to change and improve ever andever, therefore preserved low quality material will be at seri-ous risk of appearing old and technically not adequate. Thiswould be a waste for archive and preservation efforts.

2.5 Content lifecycleThe mass digitisation of analogue archive holdings plus thetransition to tapeless production for new content means AVarchives inevitably face the prospect of adopting file-basedsolutions using IT storage technology. Audiovisual archivesare typically very active places as a result of continual con-tent acquisition, access and reuse. AV archives are increas-ingly becoming directly integrated into wider content pro-duction and consumption processes, yet somehow need toachieve content permanence in a world where almost every-thing seems to be very much transient and concerned withthe ’here and now’ and not the future. Long-term safe stor-age of assets for ten, fifty or even a one hundred years is verydifficult to achieve in an environment where little stays thesame for more than a few years. This dynamic world bringswith it the need for archives to accommodate AV contentin a wide range of contemporary formats, both for submis-sion and access, and to integrate a wide range of supportingtechnologies (storage, coding, wrapping, distribution, qual-ity control etc.). All of these tend to have relatively shortlifespans and hence migration to deal with technical obsoles-cence becomes a way of life. Likewise, the access and use ofthe AV content often varies during its lifecycle, often unpre-dictably and with unforeseen ’step changes’, e.g. as a resultof a move to public access. This adds further complicationsto archive management.

3. PRESERVATION FRAMEWORKIn PrestoPRIME the digital preservation will be extended inorder to take into account all the digital contents plus thoseprovided by broadcasters, which are bringing something newin the field of digital preservation: high quality videos, whichrequires innovative approaches for monitoring the status,

Page 4: Digital Preservation and Access in the PrestoPRIME Project

Figure 1: PrestoPRIME architecture layers

storing and managing the obsolescence. We are facing thechallenge of dealing with compressed/uncompressed files,audio/video format changes and the big size files of the highquality master copy version.

PrestoPRIME is developing tools and techniques to sup-port the transition to file-based content and IT systemsAV archives are currently making. Our technology develop-ments include tools for calculating the long-term Total Costof Ownership (TCO) of these systems, planning and opti-mising file format migration, assessing and comparing whatstorage technologies or services to use, managing the risksinvolved, selecting suitable preservation strategies (e.g. fileformat migration, emulation, multivalent), and automatingcontent quality control, e.g. defect detection and analysis.

Recognising that automation is the key to lowering costs(and risks), PrestoPRIME is developing supporting digitalpreservation infrastructure components that go beyond theOAIS functional model and provide automation and execu-tion of preservation policies and processes, e.g. to make useof managed storage as a service that conforms to definedService Level Agreements and QoS.

The architecture of the preservation framework is made upof three layers, which are described in the following withreference to Figure 1: application, automated managementand manual management layers.

The application layer (application channel) at the bottomcontains the services that deliver preservation and access,e.g. the tools and services that would found within themain functional areas of OAIS. By considering the appli-cation channel as a set of services, each of which has anSLA and defined QoS, then allows them all to be governedin a consistent way. The automated management layer in themiddle (management channel) automates the managementof the services and also of the customer and supplier relation-ships. The SLA manager deals with customer SLAs (those ofthe consumer and producer) which set out the constraintsand service level objectives (SLOs) on ingest and access.The resource manager deals with supplier SLAs (such as

Figure 2: PrestoPRIME preservation modelling

out-sourced storage or compute facilities) and with in-houseresources. The service manager balances commitments tocustomers with resources available internally and from ex-ternal suppliers. This is an event-decision-action loop, wherethe decision is made according to a policy. The manual man-agement layer at the top (decision support) is where peopledesign, test and set the policies to be executed by automaticmanagement layer. The decision support tools use modelsof the services which may be updated by real-world experi-ence. This layer is where ”intelligence” can be provided bya combination of automatic and manual analysis supportedby system modelling tools. The output of this analysis is themanagement policies. The service manager in the manage-ment channel uses these to decide on which action to taketo meet its commitments, to plan to continue meeting them,and to mitigate the effects of events (e.g. failures) that causeit to stop meeting them.

The key feature of the PrestoPRIME governance architec-ture is the ability to decouple the management and theservices as much as possible, e.g. through local autonomywhere services understand the parts of the policies (e.g. se-curity) relevant to them, therefore being able to make theirown immediate decisions. The monitoring and managementloop (between the application and management channels)can then be asynchronous with the service manager request-ing usage reports (either queued or instantly generated) fromthe services periodically (as a pull) and then updating theservices’ policies as necessary in a slower time-frame.

The decision support tools are essentially about preservationplanning with an emphasis on risk management and automa-tion of preservation actions, i.e. policy definition. As shownbelow, the objective is to convert archive’s needs (how muchcontent, how long to keep it, how safe it needs to be, andwho needs to access it) into a preservation plan (what to do,when to do it, what the consequences will be). The toolscombine preservation modelling (e.g. file level preservationapproaches) with storage modelling (bit level preservationapproaches) and considers the interplay between these two(e.g. choice of file format impacts on storage needs as wellas sensitivity to corruption in the storage layer).

The result is a preservation plan (set of automatable poli-cies plus projections of cost, access and loss over time). The

Page 5: Digital Preservation and Access in the PrestoPRIME Project

Figure 3: PrestoPRIME file format migration

policies include preservation actions at all levels, which in-clude periodic fixity checks and repair at the bit level (e.g.disc scrubbing), plans for large scale file format or storagemigrations, how to deal with conflict (e.g. preservation ac-tions such as migration or fixity checking can consume re-sources that might otherwise be needed to deliver access tocontent), and security (e.g. authentication, access control,rights management).

In each of these areas, the best course of action is a functionof time and hence the policies require ongoing review. Forexample, the choice of file format to use for a particularAV asset depends on available and projected tool support,storage capacity, budget along with requirements for quality,safety and access. A simplified diagram for video file formatmigration is shown in Figure 3, further details can be foundin project deliverables [17].

4. ARCHITECTURE & TECHNOLOGIESPrestoPRIME aims to provide an open, flexible and stan-dard system for the digital preservation. Beside the men-tioned MDA [12] approach, all the basic functionalities willbe exposed as services leading towards an ESB (EnterpriseService Bus) architecture.

Following the OAIS specifications and the Object OrientedProgramming guidelines, and considering previous experi-ences such as the CASPAR project, the PrestoPRIME Preser-vation Platform takes into account the best practices of com-mercial systems such as Rosetta developed by ExLibris [3],one of the current leader in the market place and partner ofPrestoPRIME.

The overall approach for PrestoPRIME software develop-ment makes use of well established best practices and designpatterns already used for solving software problems. In or-der to achieve the highest flexibility and to enable an easyplug-in of new software components into the platform, anESB architecture is adopted. Services are provided by SOAP(Simple Access Object Protocol) each time a specific schemais required leaving REST (REpresentational State Transfer)to manage the protocol necessary to exchange simple XML

documents.

The architecture design contains the main OAIS functionalblocks mapped into software components: ingest, access, ad-ministration, preservation planning, data management andarchival storage.

If a pre-existing digital preservation system follows the OAISguidelines, the PrestoPRIME solution will be easily inte-grated and can help to manage some specific issues not al-ready covered. Otherwise the PrestoPRIME platform couldbe used in order to migrate an old archive to a new one,based upon standards and best practices. Finally some iso-lated tool can be extracted and plugged into an existingsystem. Recognising that automation is the key to loweringcosts (and risks), PrestoPRIME is supporting digital preser-vation infrastructure components that have the ability toexecute preservation policies and processes and make useof managed storage as a service that conforms to definedService Level Agreements and QoS.

One way to avoid the act of design is to reuse existingdesigns: according to the MDA [12] approach, the overallanalysis and design of the PrestoPRIME preservation plat-form reference architecture maps the six OAIS logical blocksonto software components making use of available designpatterns. Generative patterns are used for addressing thespecific requirements of each functional block. The designpatterns solving the problems are recognised and broughttogether in order to build up the software component dia-gram for which Figure 4 shows a simplified overview, point-ing out the conceptualised compontent of each block, wherethe OAIS activities and functions in each block are omitted,since the diagram is focused on the software implementation(actually the Platform Independent Model [12] design).

A workflow module, not shown in the diagram, is the soft-ware component contemplated for managing the various pro-cess activities run under the platform’s control, such as theingestion and the preservation. A further component notplanned in the OAIS model has been added to the figure(top left): the Preservation Registry, that the platform hasto contact during most of the processes, such as the ingestion(SIPProcessing) and the RiskAnalysis (Preservation Plan-ning). It is made up of three main components: Format,Risk and Application.

The Format component is responsible for the identificationof the format and type of the digital content. Associated tothe Format we have the MetadataExtractors, that are avail-able for a specific Format and type (for example for pdf For-mat and Type version 2). The Risk component representsthe risk associated to a specific Format (and type). TheApplication component is split into Editing and Renderingapplications, for example respectively for coding video con-tents or play a movie. These kind of information are storedinto the Preservation Registry that should be distributed.In our context, there is a Registry for each Archive and it isplanned to have a centralised one gathering the informationfrom all the archives, in order to provide a federated servicefor automatic preservation.

The detailed description of the six OAIS blocks depicted in

Page 6: Digital Preservation and Access in the PrestoPRIME Project

Figure 4: PrestoPRIME Preservation Platform Component Diagram

Figure 4 is beyond the scope of the document. Further in-formation can be found in PrestoPRIME deliverables [17],the full details of the architecture design and software im-plementation will appear publicly in Summer 2010 [18].

5. AUDIOVISUAL RIGHTS MANAGEMENTRights clearance and management operations are becomingthe major bottleneck in the exploitation of archived AV con-tent, whether analogue or digital. The original contracts of-ten need to be analysed and interpreted by specialists, whichconsiderably increases costs and causes delays: vast portionsof the archive remain unused due to uncertainty over rights.The sheer number of rights at the EU level illustrates howdifficult it will be to create an automatic rights system.

Rights statements are included in contracts written in freeform, without an agreed and unambiguous terminology. State-ments can apply to different parts of the content, but asthe contract is often drafted before the content is produced,there is no explicit association between shots and rights.Furthermore, rights can have a temporal span and are of-ten reacquired or renegotiated subsequently. Different reg-ulations apply across Europe, expressed with different ter-minologies, with different approaches based on different le-gal traditions. The existence of a common rights modelwould greatly simplify the rights management as it wouldbe straight forward to extract and interpret rights clausesfrom contracts and map them to the content time line, asany other metadata.

PrestoPRIME aims to develop an infrastructure for mediacontent provenance and tracking, and a European rights on-tology, as the basis for an integrated system of rights man-

agement throughout the lifecycle from acquisition, to edit-ing, to archive and distribution. This will be achieved by:developing a European rights ontology based and associateddata model, based on the analysis of current rights typolo-gies associated with broadcast content in Europe, which willbe included in a common rights glossary; delivering a rightsmanagement system to extract and interpret rights clausesfrom contracts and map them to the content time line, asmetadata and that will index the digital rights metadatawithin the archive, enabling the search for content basedon digital rights information; guaranteeing interoperabilitybetween archives, other heritage institutions, the SemanticWeb and UGC adopting a solution compliant with most rel-evant standards such as MPEG-21 REL [10] and ODRL3.

PrestoPRIME is focusing on open standards also by definingprofiles, specifying subsets or combinations of two or severalstandards when necessary. The broadcaster partners willfoster the contribution to such standards with the require-ments and best practices for contracts and licenses relevantin their context.

6. STANDARDS AND BEST PRACTICESThis section discusses various standards initiatives aroundpreservation of AV content. We put the main focus on for-mats for metadata and containers, as those are specific toAV content, while standards for system design or retrievaltend to be more generic (being only adpated to AV contentin the context of a certain implementation).

6.1 METS3Open Digital Rights Language - http://odrl.net

Page 7: Digital Preservation and Access in the PrestoPRIME Project

Metadata Encoding and Transmission Standard (METS) [13]is a standard representation for expressing the hierarchicalstructure of digital library objects, including the names andlocations of the files that comprise those objects and theassociated metadata. METS allows the use of externallydeveloped metadata schemes, which can be fit into its twodefined metadata sections. METS itself does not care aboutthe descriptive or administrative metadata schemes that areincorporated by implementers.

Some community based standards are recognised by the METSboard and one of them is PREMIS [16], used to describepreservation metadata. To use PREMIS together with METSsome decisions have to be made in advance as the PREMISschema was developed in an implementation neutral wayand also METS has quite some flexibility within it. Severalissues exist, e.g. there exist redundancies among PREMISand METS, PREMIS schemas can be used in a numberof METS sections, whether to use the PREMIS containerschema or not, how to deal with format specific metadatawithin PREMIS. As a consequence on these issues guide-lines have been developed between the METS and PREMIScommunities [16].

METS is one of the most commonly used containers (“trans-fer syntax”) for representing different types of packaages inan OAIS compliant system (SIP, AIP, DIP). As METS stillprovides flexibility in terms of the descriptive, preservationand rights metadata used (or referenced), it has been de-cided to adopt METS also as a container for the packagesin the PrestoPRIME system.

6.2 Container formatsMXF (SMPTE 377M) [22] seems to become the first choiceas archive container format for AV content. Variants sup-porting the most common codecs in broadcast production(e.g. MPEG-2, D-10 [21], [23] ) are defined and for digitalcinema content it also used as the master and distributioncontainer (according to DCI specification, with JPEG2000encoding). Other appealing properties are the option to in-clude uncompressed essence (if a collection can afford thestorage costs) and the option to embed a wide range ofmetadata. Among the various defined operational patternsof MXF, the Operational Pattern 1a (SMPTE 378M) [24],which defines a file with a single playable essence comprisinga single essence element or interleaved essence elements, ismost relevant in a preservation context.

6.3 MetadataWe discuss here some relevant standardisation initiatives inthe area of AV metadata models, rights models and fragmentidentification. It is impossible to cover in this papers thewide range of metadata standards for multimedia content, agood overview can be found in [4].

6.3.1 MPEG-7MPEG-7 [8] is an excellent choice for the description of AVcontent due to its flexibility and comprehensiveness, but thiscomes at the price of complexity and potential interoper-ability problems. In order to partly solve these problems,profiles and levels have been proposed, but the adopted pro-files do not support detailed description of AV content and

lack semantic constraints. The Detailed Audiovisual Profile(DAVP) [1] aims at describing single multimedia contententities, allowing a comprehensive structural description ofthe content, including also audio and visual feature descrip-tions. The NHK Metadata Production Framework [14] aimsat similar goals, also defining an MPEG-7 profile with se-mantic constraints. These efforts are currently being har-monised by the EBU ECM SCAIE group [2].

MPQF (MPEG Query Format) [7] is a query format provid-ing a standardised interface for multimedia content informa-tion retrieval systems in three aspects which are input queryformat, output query format, and query managements. TheMPQF is a good candidate for the search and access func-tionalities in the PrestoPRIME preservation platform, sinceit specifies the interface through which the users can describetheir search criteria with a set of precise input parametersin addition to a set of preferred output parameters to depictthe return result sets, specified by the output query format.

6.3.2 MPEG-21MPEG-21 is a suite of standards which aims at defininga normative open framework for multimedia delivery andconsumption for use by all the players in the delivery andconsumption chain.

MPEG-21 REL (Rights Expression Language). MPEG-21 REL [10] provides a machine-readable XML-based stan-dard format for licenses and rights representation, and thepossibility to define new profiles and extensions to targetspecific requirements. Profiles defined so far, which ad-dress B2C (business-to-consumer) scenarios, are: MPEG-21OAC (Open Access Content), for mapping CreativeCom-mons licenses; MPEG-21 DAC (Dissemination and Capture)used in the broadcasting domain and mapping TVAnytimeRMPI; MPEG-21 MAM (Mobile And optical Media) usedin the mobile environment and the Media Streaming Profile,currently used in MXM development.

MVCO (Media Value Chain Ontology). MVCO [9] isan ontology for formalising the representation of the Me-dia Value Chain. It couples naturally with the MPEG-21multimedia framework and been coded as an OWL Ontol-ogy. The MVCO is a reference for the definition of a rightsontology in PrestoPRIME.

6.3.3 MXM (MPEG eXtensible Middleware)MXM [11] is a suite of standards developed for the purposeof enabling the easy design and implementation of media-handling value chains whose devices interoperate becausethey are all based on the same set of technologies, espe-cially technologies standardised by MPEG, accessible fromthe MXM middleware. MXM reference software enables theintegration of standard modules and technologies within theintegrated framework.

6.3.4 W3C InitiativesThe Media Fragment working group is developing URI basedsyntax to address fragments of media resources [25]. Thespecification supports temporal, spatial, track and named

Page 8: Digital Preservation and Access in the PrestoPRIME Project

fragments and aims to be more general w.r.t. file formatssupported than MPEG-21 part 17. The Provenance incuba-tor group [19] deals with describing and tracking provenanceof information (esp. considering developments in the Seman-tic Web and linked data). This is a very important issue inthe preservation context and an important basis for rightsmodeling.

7. CONCLUSIONSConsidering that the amount of AV content only in Europeis already in the range of millions of hours, and it’s increas-ing at a quite fast rate, for a large part of that content thedestiny in a long term scale is far from being certain. Presto-PRIME tries to address this issue by tackling on one handthe point of awareness (about content value and risks of loss)and on the other hand the capability to provide preservationservices and keeping their costs under control. AV content isparticularly challenging because of its characteristics. Theanswers from PrestoPRIME cover the areas of preservationstrategies and technologies, of defining a flexible integrationframework, and maximising the opportunities of future ex-ploitation.

The core functionalities of the PrestoPRIME preservationplatform will be released as software libraries under an openlicense. At the same time a commercial implementation willbe available, provided by ExLibris [3]. All the documenta-tion of the platform design and source code will be availablepublicly at the project website. In order to provide supportto all the users that need information and assistance in set-ting up a digital repository and a preservation system, theproject partners plan to set up a Competence Centre whichwill stay active after the completion of the project.

8. ACKNOWLEDGMENTSThis work is partially supported by the PrestoPRIME project,funded by the European Commission under ICT FP7 (Sev-enth Framework Programme, Contract No. 231161). Theproject is coordinated by Institut National de l’Audiovisuel(INA) and integrates 15 partners of all domains from France,UK, Italy, Austria, Netherlands and Israel, representing thevariety of competencies needed for the AV preservation. Theproject therefore brings together participants including archiveowners, research centres from archive institutions, generalresearch centre and universities, and SMEs industrials fordevelopment and integration, including among the projectpartners the European Digital Library Foundation (EDLF).

Additional information about PrestoPRIME, including pub-lic deliverables and software, as well as other related newsare available on the project Web site [17].

9. ADDITIONAL AUTHORSAdditional authors: Guy Ben-Porat (ExLibris Group, email:[email protected]), Annarita Di Carlo (RAICRIT, email: [email protected]), Stephen Phillips (IT In-novation, email: [email protected]), DanielTeruggi (INA, email: [email protected]), Elisa Todarello(EURIX R&D, email: [email protected])

10. REFERENCES

[1] W. Bailer and P. Schallauer. The detailed audiovisualprofile: Enabling interoperability between MPEG-7based systems. In Proceedings of 12th InternationalMulti-Media Modeling Conference, pages 217–224,Beijing, CN, Jan. 2006.

[2] EBU ECM, SCAIE group: Automatic Informationextraction. http://tech.ebu.ch/groups/pscaie.

[3] ExLibris Group. http://www.exlibrisgroup.com.

[4] M. Hausenblas (ed.). Multimedia vocabularies on thesemantic web. W3C Incubator Group Report,http://www.w3.org/2005/Incubator/mmsem/

XGR-vocabularies/, Jul. 2007.

[5] IS0/IEC 21000-2. MPEG-21 Standard Part 2: DigitalItem Declaration Language (DIDL).

[6] ISO 14721:2003. CCSDS, Reference Model for anOpen Archival Information System (OAIS).

[7] ISO/IEC 15938-12. MPEG-7 Standard Part 12:MPEG Query Format (MPQF).

[8] ISO/IEC 15938. MPEG-7 Standard: MultimediaContent Description Interface.

[9] ISO/IEC 21000-19. MPEG-21 Standard Part 19:Media Value Chain Ontology (MVCO).

[10] ISO/IEC 21000-5. MPEG-21 Standard Part 5: RightsExpression Language (REL).

[11] ISO/IEC 23006. MPEG eXtensible Middleware(MXM).

[12] MDA. Model Driven Architecture.http://www.omg.org/mda/.

[13] METS. Metadata Encoding and TransmissionStandard. http://www.loc.gov/standards/mets/.

[14] NHK Metadata Production Framework. http://www.nhk.or.jp/strl/mpf/english/about.htm.

[15] EBU Tech 3295. P META Metadata Library v2.1, Jul.2009.

[16] PREMIS. Preservation Metadata MaintenanceActivity. http://www.loc.gov/standards/premis.

[17] PrestoPRIME - Keeping Audiovisual Contents Alive.http://www.prestoprime.eu.

[18] PrestoPRIME Deliverable D5.2.1: Definition andDesign of a PrestoPRIME Reference Architecture forthe Integration Framework, to be published insummer 2010. http://www.prestoprime.eu.

[19] W3C provenance incubator group.http://www.w3.org/2005/Incubator/prov/.

[20] Shaman: Sustaining heritage access throughmultivalent anrchiving.http://shaman-ip.eu/shaman/.

[21] SMPTE 356M. Type D-10 Stream Specifications,MPEG-2 4:2:2P @ ML for 525/60 and 625/50.

[22] SMPTE 377M. Material Exchange Format (MXF),File Format Specification, 2004.

[23] SMPTE 386M. Material Exchange Format (MXF),Mapping Type D-10 Essence Data to the MXFGeneric Container.

[24] SMPTE 378M. Material Exchange Format (MXF)operational pattern 1a (single item, single package).http://www.smpte.org, 2004.

[25] R. Troncy and E. Mannens (eds.). Media FragmentsURI 1.0. W3C Working Draft,http://www.w3.org/TR/media-frags, Dec. 2009.