28
1 UNIONE EUROPEA Digital Libraries on the Grid to preserve cultural Heritage A use case: Federico De Roberto manuscripts Leandro Ciuffo on behalf of Dr. Antonio Calanducci ([email protected] ) Istituto Nazionale di Fisica Nucleare – Catania

1 UNIONE EUROPEA Digital Libraries on the Grid to preserve cultural Heritage A use case: Federico De Roberto manuscripts Leandro Ciuffo on behalf of Dr

Embed Size (px)

Citation preview

Page 1: 1 UNIONE EUROPEA Digital Libraries on the Grid to preserve cultural Heritage A use case: Federico De Roberto manuscripts Leandro Ciuffo on behalf of Dr

1

UNIONE EUROPEA

Digital Libraries on the Grid to preserve cultural Heritage

A use case: Federico De Roberto manuscripts

Leandro Ciuffo on behalf ofDr. Antonio Calanducci([email protected])Istituto Nazionale di Fisica Nucleare – Catania

Page 2: 1 UNIONE EUROPEA Digital Libraries on the Grid to preserve cultural Heritage A use case: Federico De Roberto manuscripts Leandro Ciuffo on behalf of Dr

2

Federico De Roberto cultural heritage

• De Roberto, an Italian writer of the XIX/XX century, born in Naples, but spending his life in Catania, has left to the humanistic communities numerous works

• Those are made up of valuable and hard-to-manage pieces: manuscripts, typescripts, draft with handwriting corrections, magazines, cuts, sketches, photos

3

Page 3: 1 UNIONE EUROPEA Digital Libraries on the Grid to preserve cultural Heritage A use case: Federico De Roberto manuscripts Leandro Ciuffo on behalf of Dr

3

Fondo letterario De Roberto

• Digitalization of manuscripts, typescripts, printed works– TIFF Files, one per page, 600 dpi, about 100MB for A3

High resolution scans for in-depth examination

– Multipage PDF, one per work, 300 dpi, varying file sizes 40-400MB Overall examination of works

– 8000 scans, 2 Terabyte of disk space– Different physical formats, A3/A4/custom size

55

Digitalization

Page 4: 1 UNIONE EUROPEA Digital Libraries on the Grid to preserve cultural Heritage A use case: Federico De Roberto manuscripts Leandro Ciuffo on behalf of Dr

4

Fondo letterario De Roberto

• Embedded Metadata– TIFF with embedded metadata to provide scan physical features

and information about the content ImageWidth, ImageHeight, XResolution, FileSize, CreationDate,

ModifyDate Description, Keywords, CaptionWriter, Title, Author, Copyright

Status, Copyright Notice

– Added with Photoshop after the digitalization phase (Adobe XMP format)

55

Metadata

Page 5: 1 UNIONE EUROPEA Digital Libraries on the Grid to preserve cultural Heritage A use case: Federico De Roberto manuscripts Leandro Ciuffo on behalf of Dr

5

Obiettivi e requisiti

• Make those works accessible to the humanistic research communities

• Immediately find the desired document– Document organization according the physical and semantic

metadata By type By category Dynamic filtering of search result set according the selection of one or

more document metadata

• Long-term preservation (digital preservation)– Multiple copies (replicas) spread in different geographical sites

– Reliability of storage systems and replica redundancy to achieve secure preservation

66

Goals and requirements

Page 6: 1 UNIONE EUROPEA Digital Libraries on the Grid to preserve cultural Heritage A use case: Federico De Roberto manuscripts Leandro Ciuffo on behalf of Dr

6

Data Management in Grid

• Storage Element(SE): front-end server aggregating a set of (pool) hard disks providing the illusion of a big (virtual) disk

77

“container” of users’ files generally one SE per site mirrored disks to avoid data loss in case of hardware

failures fine-grained set up of file permissions: owner, group,

given lists of users and groups (Access Control Lists - ACLs)

Keep the mapping file-physical disk of the pool

• File Catalogue: provide a unique virtual file system among several Storage Elements: keep track of which SE (or SEs) contains a given file

– keep track of replicas– mapping file-Storage Element filename

Data Management in Grid

Page 7: 1 UNIONE EUROPEA Digital Libraries on the Grid to preserve cultural Heritage A use case: Federico De Roberto manuscripts Leandro Ciuffo on behalf of Dr

7

Data Management in Grid

• Metadata Catalogue: store and organize metadata of files saved on Storage Elements and registered on the File Catalogue

– metadata organized by “collection” (sort of directory) each collection has its schema, a set of defined attributes:

• es: /deroberto/scans/manuscripts o Title: “La lupa”o Author: “Federico De Roberto, Giovanni Verga”o Genre: “Tragedia Lirica”o Pages: 34o FileType: TIFFo surl:

srm://infn-se-01.ct.pi2s2.it/dpm/ct.pi2s2.it/home/cometa/generated/2008-06-14/filede4d6266-56c4-4d66-95b6-3d69063ef081

– responsible to answer users’ queries against metadata describing files, to find out their physical location for future retrieval

88

Data Management in Grid

Page 8: 1 UNIONE EUROPEA Digital Libraries on the Grid to preserve cultural Heritage A use case: Federico De Roberto manuscripts Leandro Ciuffo on behalf of Dr

8 99

The Sicilian Grid COMETA

Page 9: 1 UNIONE EUROPEA Digital Libraries on the Grid to preserve cultural Heritage A use case: Federico De Roberto manuscripts Leandro Ciuffo on behalf of Dr

9 99

300+ TBytes

International Workshop on Cyberinfrastructure and Archeology, San Mianiato (PI), 16th-17th Oct 08

Current deployment - (COMETA Grid)

Page 10: 1 UNIONE EUROPEA Digital Libraries on the Grid to preserve cultural Heritage A use case: Federico De Roberto manuscripts Leandro Ciuffo on behalf of Dr

10

gLibrary project

• Challenge:– to offer a intuitive, flexible, secure and multiplatform

system to handle digital libraries on a Grid infrastructure

• Digital Assets: (items handled in a digital library)– Any kind of content and/or media represented as a digital

file. Es.: Images (Photos, Scans, Screenshots, Logos, ...) Audio (Songs, Sound Tracks, Ringtones, ...) Video (Movie, Trailers, Mobile phone videos, ...) Presentations, Letters, Reports, Invoices, Receipts E-Books, E-Mails, Papers, Magazines etc etc...

• gLibrary allows to store, organize, search and retrieve digital assets on a Grid environment

1010

The gLibrary project

Page 11: 1 UNIONE EUROPEA Digital Libraries on the Grid to preserve cultural Heritage A use case: Federico De Roberto manuscripts Leandro Ciuffo on behalf of Dr

11

Caratteristiche di gLibrary• Intuitive front-end implemented as a web application:

– accessible from everywhere, it needs only Internet access– usable by any web browser (Internet Explorer, Mozilla

Firefox, Opera, Safari) from any operating system (Windows, Linux, Mac Os X) ---> multiplatform It requires a Java Virtual Machine (available on any OS)

1111

– Extensive usage of AJAX (Asyncronous JavaScript and XML)

make web applications dynamic and interactive providing a desktop-like user experience

International Workshop on Cyberinfrastructure and Archeology, San Mianiato (PI), 16th-17th Oct 08

gLibrary features

Page 12: 1 UNIONE EUROPEA Digital Libraries on the Grid to preserve cultural Heritage A use case: Federico De Roberto manuscripts Leandro Ciuffo on behalf of Dr

12

Organizzazione delle DL

• “Types” and “Categories” definition by repository providers:

12

• Assets are organized by type:

– a list of specific attributes to describe each kind of asset to be managed by the system

– hierarchical (a child type shares and extend parent’s attributes)

– queried during searches

• and/or organized by category:

– Group together related assets of different types;

– Useful also to define subsets of assets belonging to the same type

– Multiple category assignment per asset (tagging)

International Workshop on Cyberinfrastructure and Archeology, San Mianiato (PI), 16th-17th Oct 08

Assets organization

Page 13: 1 UNIONE EUROPEA Digital Libraries on the Grid to preserve cultural Heritage A use case: Federico De Roberto manuscripts Leandro Ciuffo on behalf of Dr

13

Ricerca intuitiva• Assets are browsed selecting a type (or category) and

selecting one or more filters:– attributes of the selected types, chosen from a defined list, used to

narrow the result set

• Filter application is cascading and context-sensitive: the selection of a filter value dynamically influences subsequent filter values (“à la iTunes” browsing)

– Classical search by description and keywords available too

1313International Workshop on Cyberinfrastructure and Archeology, San Mianiato (PI), 16th-17th Oct 08

Intuitive and instant search

Page 14: 1 UNIONE EUROPEA Digital Libraries on the Grid to preserve cultural Heritage A use case: Federico De Roberto manuscripts Leandro Ciuffo on behalf of Dr

14

Dettaglio dell’asset selezionato

1414International Workshop on Cyberinfrastructure and Archeology, San Mianiato (PI), 16th-17th Oct 08

Details of asset selection

Page 15: 1 UNIONE EUROPEA Digital Libraries on the Grid to preserve cultural Heritage A use case: Federico De Roberto manuscripts Leandro Ciuffo on behalf of Dr

15

Memorizzare e recuperare gli assets

• Users can upload their local assets on one or more (creating replicas) Storage Elements of the Grid

– Uploads managed through Java Applets

– Files already on SE can be included in a digital library by the File File Catalogue browser

• Download from SEs to the users’ laptop/desktop:– selection of a replica link from a list– download java applet

1515International Workshop on Cyberinfrastructure and Archeology, San Mianiato (PI), 16th-17th Oct 08

Assets storing and retrieval

Page 16: 1 UNIONE EUROPEA Digital Libraries on the Grid to preserve cultural Heritage A use case: Federico De Roberto manuscripts Leandro Ciuffo on behalf of Dr

16

Sicurezza e gestione degli utenti• Being a grid application, gLibrary inherits all the

security features coming from the underlying technologies

– X.509 digital certificates authentication – Transfers based on proxy authorization – VOMS (Virtual Organization Membership Service) usage to

distinguish users and assign the right permissions

• 3 kind of user role for each digital library deployed:– gLibraryManager:

define the hierarchies of types and categories (with their attributs) and filters

grant submission rights to generic users

– gLibrarySubmitter: upload new assets and define permissions on its entries (fine-grained rights assignment)

– generic users: enabled to searches and downloads (on assets they have rights to)

1616International Workshop on Cyberinfrastructure and Archeology, San Mianiato (PI), 16th-17th Oct 08

Security and user management

Page 17: 1 UNIONE EUROPEA Digital Libraries on the Grid to preserve cultural Heritage A use case: Federico De Roberto manuscripts Leandro Ciuffo on behalf of Dr

17

Architettura di gLibrary

1717User

Login applet

AMGA MetadataCatalogue

LFC FileCatalogue

SE

SE

SE

Upload/Download applet

VOMS Server

1. local proxy creation

2. proxy transfer

over HTTPS

3. get role

6. direct transfer from SE

5. proxy retrieved over HTTPS

4. find the right asset

gLibrary architecture

Page 18: 1 UNIONE EUROPEA Digital Libraries on the Grid to preserve cultural Heritage A use case: Federico De Roberto manuscripts Leandro Ciuffo on behalf of Dr

18

Possibili scenari d’uso

• Suitable to communities with needs of sharing big amount of digital resources in a easy and secure way

• Some examples:– “consumer” users: sharing of photos, music, movies,

documents, office, etc..– enterprise/industrial/research communities: presentations,

invoices, layouts, sounds, scans, manuscripts :)

• Each community defines how to describe their content (and how to search for it) setting permissions in order to grant or deny access to specific users, groups and whole organizations, exploiting the huge storage capabilities, organization and security features offered by a Grid infrastructure

• A use case: “De Roberto Digital Repository”

1818International Workshop on Cyberinfrastructure and Archeology, San Mianiato (PI), 16th-17th Oct 08

Possible usage scenarios

Page 19: 1 UNIONE EUROPEA Digital Libraries on the Grid to preserve cultural Heritage A use case: Federico De Roberto manuscripts Leandro Ciuffo on behalf of Dr

19

• Goals:– to store the 8000 scans of De Roberto Heritage ---->

Grid Storage Elements– to enable an ubiquitous and 24/24h access to scientists

---> web application– document organization for a fast search ---> metadata

services– long-term digital preservation of data ---> redundancy

through replicas of files on several Storage Elements– easy-to-use interface for searches, organization, upload and

download of digitalized documents

• ----->

1919

Page 20: 1 UNIONE EUROPEA Digital Libraries on the Grid to preserve cultural Heritage A use case: Federico De Roberto manuscripts Leandro Ciuffo on behalf of Dr

20

Metadata per la DR digital library

• Types definition for the assets of the DR library

• Attributes definition per type. Es:

2020

Attributo Valore

Title la lupa

Author federico de roberto, giovanni verga

Description manoscritto della tragedia lirica …

Keywords verismo, federico de roberto, la lupa, …

CaptionWriter stefania iannizzotto, alessandro …

CopyrightStatus copyrighted

PageNum 5

TotalPages 34

DocumentGenre tragedia lirica

PublicationYear 1916

Publsher officine tipo-litografiche barravecchia e balestrini

FileType PDF

Resolution 300

ScanQuality good

• Filter definition per type. Es:

• DocumentGenre

• Title

• FileType

• ScanQuality

• DocumentType

• PublicationYear

• PublicationStatus

• Publisher

• Location

Metadata used in the DR digital library

Page 21: 1 UNIONE EUROPEA Digital Libraries on the Grid to preserve cultural Heritage A use case: Federico De Roberto manuscripts Leandro Ciuffo on behalf of Dr

21

Browsing and filtering screenshot

2121

Page 22: 1 UNIONE EUROPEA Digital Libraries on the Grid to preserve cultural Heritage A use case: Federico De Roberto manuscripts Leandro Ciuffo on behalf of Dr

22

Downloading

2222

Downloading

Page 23: 1 UNIONE EUROPEA Digital Libraries on the Grid to preserve cultural Heritage A use case: Federico De Roberto manuscripts Leandro Ciuffo on behalf of Dr

23

Download completato

2323

Download completed

Page 24: 1 UNIONE EUROPEA Digital Libraries on the Grid to preserve cultural Heritage A use case: Federico De Roberto manuscripts Leandro Ciuffo on behalf of Dr

24

Upload

2424International Workshop on Cyberinfrastructure and Archeology, San Mianiato (PI), 16th-17th Oct 08

Upload

Page 25: 1 UNIONE EUROPEA Digital Libraries on the Grid to preserve cultural Heritage A use case: Federico De Roberto manuscripts Leandro Ciuffo on behalf of Dr

25

Estrazione automatica dei metadati

• There are some libraries that allow automatic metadata extraction from given file types:

– exiftool– Imagero

• Both have been able to read XMP metadata. Es:– $ exiftool -E -XMP:Subject -XMP:Description -XMP:Rights -XMP:Title -XMP:Author -FileName -FileSize

001\ gli\ illustri\ amanti.tif

– Subject : federico de roberto, manoscritti letterari, verismo, gli illustri amanti, la.mu.s.a., facoltà di lettere e filosofia catania, società di storia patria per la sicilia orientale

– Description : manoscritto de gli illustri amanti, conservato presso la biblioteca della società di storia patria per la sicilia orientale

– Rights : società di storia patria per la sicilia orientale catania.la.mu.s.a., facoltà di lettere e filosofia, università degli studi di catania

– Title : gli illustri amanti

– File Name : 001 gli illustri amanti.tif

– File Size : 106 MB

• We are working to integrate those libraries to speed up the acquisition stage

2525

Automatic metadata extraction

Page 26: 1 UNIONE EUROPEA Digital Libraries on the Grid to preserve cultural Heritage A use case: Federico De Roberto manuscripts Leandro Ciuffo on behalf of Dr

26

Conclusioni

• gLibrary challenge is to offer a flexible, multiplatform, secure and easy-to-use system to handle digital libraries on Grid

– flexible: allow to handle any kind of asset, defined by the library admin

– multiplatform: implemented as a web application with Java applets can be accessed by any OS

– secure: fine grained permission (Grid certificate based) can be set for assets

– easy-to-use: its intuitive interface, with “à la iTunes” browser allows to find the desired asset with just a few mouse clicks

• In a few weeks a prototype of the De Roberto Digital Repository has been implemented with gLibrary. It will enable scientists to access those works from anywhere and anytime in a simple and smart way and it will allow the long-term preservation of this cultural heritage

2626

Summary

Page 27: 1 UNIONE EUROPEA Digital Libraries on the Grid to preserve cultural Heritage A use case: Federico De Roberto manuscripts Leandro Ciuffo on behalf of Dr

27

Riferimenti

• Contact: [email protected], [email protected]

• Prototype of the De Roberto Digital Repository:– https://glibrary.ct.infn.it/deroberto/

• gLibrary project homepage (currently under maintaince):

– https://glibrary.ct.infn.it/

• Papers:A. Calanducci, C. Cherubino, L. N. Ciuffo, D. Scardaci, “A Digital Library

Management System for the Grid”, Fourth International Workshop on Emerging Technologies for Next-generation GRID (ETNGRID 2007) at 16th IEEE International Workshops on Enabling Technologies: Infrastructures for Collaborative Enterprises (WETICE-2007), GET/INT Paris, France, June 18-20, 2007 (http://etngrid.diit.unict.it/2007/index.html).

• A. Calanducci, C. Cherubino, L. N. Ciuffo, D. Scardaci, “gLibrary: Digital Asset Management System for the Grid”, IEEE Hypermedia and Grid Systems Conference at 30th Jubilee International Convention MIPRO, Opatija, Croatia, May 21-25 2007 (http://www.mipro.hr/) 2727

References

Page 28: 1 UNIONE EUROPEA Digital Libraries on the Grid to preserve cultural Heritage A use case: Federico De Roberto manuscripts Leandro Ciuffo on behalf of Dr

282828

Thanks for your attention

https://glibrary.ct.infn.it/deroberto/

Thanks for your attention

https://glibrary.ct.infn.it/deroberto/