67
The Mellon-Funded The Mellon-Funded Fedora Project Fedora Project A Briefing for the A Briefing for the Los Alamos National Laboratory Los Alamos National Laboratory August 26, 2002 August 26, 2002 Sandy Payette Cornell Information Science

Sandy Payette Cornell Information Science

Embed Size (px)

DESCRIPTION

The Mellon-Funded Fedora Project A Briefing for the Los Alamos National Laboratory August 26, 2002. Sandy Payette Cornell Information Science. Motivation. The Problem of Complex Content. Some familiar objects. Digital Library Content not just documents. Complex, compound, dynamic objects. - PowerPoint PPT Presentation

Citation preview

Page 1: Sandy Payette Cornell Information Science

The Mellon-Funded The Mellon-Funded Fedora ProjectFedora Project

A Briefing for the A Briefing for the Los Alamos National LaboratoryLos Alamos National Laboratory

August 26, 2002August 26, 2002

Sandy Payette

Cornell Information Science

Page 2: Sandy Payette Cornell Information Science

MotivationMotivation

The Problem of Complex Content

Page 3: Sandy Payette Cornell Information Science

Digital Library ContentDigital Library Contentnot just documents ...not just documents ...

Some familiar objects

Complex, compound, dynamic objects

Page 4: Sandy Payette Cornell Information Science

Key Research QuestionsKey Research Questions How can clients interact with heterogeneous

collections of complex objects in a simple and interoperable manner?

How can complex objects be designed to be both generic and genre-specific at the same time?

How can we hide the complexity of an object’s underlying data structures and relationships from clients?

How can we associate services and tools with objects to provide different presentations or transformations of the object content?

How can we associate specialized, fine-grained access control policies with specific objects, or with groups of objects?

Page 5: Sandy Payette Cornell Information Science

The Flexible Extensible Digital Object The Flexible Extensible Digital Object Repository Architecture (FEDORA)Repository Architecture (FEDORA)

Developed as a DARPA and NSF-funded research project at Cornell (1997-present)– CORBA-based reference implementation– Extensive interoperability testing– Policy Enforcement

Interpreted and re-implemented at University of Virginia (1999)– Simple web-oriented implementation, focused on access to collections– Java servlet and relational db

Virginia prototype supported testbed of 10,000,000 digital objects with very good results (1999-2001)

Andrew W. Mellon Foundation granted Virginia and Cornell $1,000,000 to develop a full-featured production FEDORA system that that is web-based (2002+)

Page 6: Sandy Payette Cornell Information Science

FEDORAFEDORAOriginal Research GoalsOriginal Research Goals

• Flexibility – object model that fits many different contexts• Management - of distributed digital content and services• Access – stable interfaces to digital objects; behavior-centric• Interoperability – among digital objects and repositories • Extensibility – easy evolution of object behaviors

• Security – rights management and access control • Preservation – of content, plus “look and feel”

Page 7: Sandy Payette Cornell Information Science

Model for Collaboration Model for Collaboration Digital Library Research and Digital Library Research and Real Library RequirementsReal Library Requirements

University of Virginia developing extensive digital collections since 1992

Virginia Digital Library R&D Group chartered with finding solution for integration

Formal Requirements analysis Search for commercial products Discovery: Cornell research parallels stated

requirements

Page 8: Sandy Payette Cornell Information Science

Virginia Requirements:Virginia Requirements:Heterogeneous Digital CollectionsHeterogeneous Digital Collections

BooksRare Books

Multimedia Music

E-texts Maps Photographs Statistics

Video Art Manuscripts Data

Images3-D

ObjectsJournals

Sound Effects

Page 9: Sandy Payette Cornell Information Science

Virginia Requirements:Virginia Requirements:Managing the CollectionsManaging the Collections

Scalability to support hundred of millions of objects Persistent unique names for all resources without

respect to machine address Support inter-relationships among objects Manage the digital resources and metadata, as well

as computer programs, services and tools that support them

Enforce appropriate policies for use of Library resources

Provide a high level of security Support preservation activities appropriately

Page 10: Sandy Payette Cornell Information Science

Virginia Requirements:Virginia Requirements:Delivering the CollectionsDelivering the Collections

Well-architected, flexible relationships between services/tools and digital content

Digital objects, themselves, have ability to provide users with an appropriate launch-pad or tool to use the object content

Every resource can be used in any number of contexts Move towards a digital library that is configurable by an

“aware” user Provide resource discovery (searching) across the full

collection Deep searching in particular collections

Page 11: Sandy Payette Cornell Information Science

Shortcomings of commercial Shortcomings of commercial digital library productsdigital library products

Narrow focus on specific media formats (e.g. image databases, document management)

Fail to effectively address interrelationships among digital entities

Fail to address interoperability; no open interfaces to facilitate sharing of services; no standard protocols for cross-system interoperability

Fail to provide facilities for managing programs and tools that are integral to delivering digital content.

Not extensible; does not enable easy integration of new tools and services

Page 12: Sandy Payette Cornell Information Science

The Fedora ArchitectureThe Fedora Architecture

Overview of Basic Model

Page 13: Sandy Payette Cornell Information Science

FEDORA FEDORA Basic Architectural AbstractionsBasic Architectural Abstractions

Digital Object– Container for aggregating any digital content– Content disseminations based on behavior definitions– Extensibility of behavior mechanisms

• Repository– Service layer for “contained” Digital Objects– Object lifecycle management– Access management

Page 14: Sandy Payette Cornell Information Science

Persistent ID (PID)

Disseminators

System Metadata

Datastreams

FEDORA Digital ObjectFEDORA Digital Object

Globally unique persistent id

Public view: access methods for obtaining “disseminations” of digital object content

Internal view: metadata necessary to manage the object

Protected view: content that makes up the “basis” of the object

Page 15: Sandy Payette Cornell Information Science

Persistent ID (PID)

Service DefinitionMetadata

SystemMetadata

Datastreams

Behavior DefinitionObject

Behavior MechanismObject

Persistent ID (PID)

Disseminators

System Metadata

Datastreams

Data Object

FEDORA Digital Object ArchitectureFEDORA Digital Object Architecture

Persistent ID (PID)

Service BindingMetadata

SystemMetadata

Datastreams

Page 16: Sandy Payette Cornell Information Science

D a ta O b jec t

D i s s e m i nato r s

Watermarker

SystemMetadata

D atas tr e am s

w a te r m a r k f i l e

m e d r e s . i m a g e f i l e

h i g h r e s . i m a g e f i l e

<BMech-PID>

PID =bmech-img:12

D i s s e m i nato r s

Bootstrap

SystemMetadata

D atas tr e am s

W S D L de f in t io n s

D a ta s t r e a m B i n d S p e c

U s e r D o c u m e n t a t i o n

B eh a v io r M ech a n ismO b jec t

PID =uva-lib:1225

F e do r a R e po s i to r y

R e m o teW a te rm a rk

S e rv ice

Data Object Association to External Behavior ServiceData Object Association to External Behavior Service

Page 17: Sandy Payette Cornell Information Science

Digital Object InteroperabilityDigital Object Interoperability Common Behaviors for Variable ContentCommon Behaviors for Variable Content

W e b-I m a g eB e h a v io r

D e f in it io n

G e tTh u m bn a il

G e tL o wR e s o lu t io n

G e tM e dR e s o lu t io n

G e tH ig h R e s o lu t io n

D ig ita l O b jec t A

PID

D i s s e m i nato r s

Web-image

SystemMetadata

D atas tr e am s(4 im a g e f ile s )

t h u m b n a i l i m a g e f i l e

m e d r e s . i m a g e f i l e

h i g h r e s . i m a g e f i l e

m a x r e s . i m a g e f i l e

D ig ita l O b jec t B

PID

D i s s e m i nato r s

Web-image

SystemMetadata

D atas tr e am s(1 wa v e le t f ile )

M r S ID e n c o d e d f i l eFunctional equivalency

Page 18: Sandy Payette Cornell Information Science

Digital Object ExtensibilityDigital Object Extensibility Adding New BehaviorsAdding New Behaviors

The sameunderlyingcontent...

can be operated onin novel ways…

Book

Photo Collection

to create new disseminations

not originally conceived of

Digital Object

PID

D i s s e m i nato r s

Web-book

SystemMetadata

D atas tr e am s

TEI f i l e

p a g e 1 i m a g e f i l e

p a g e 2 i m a g e f i l e

p a g e 3 i m a g e f i l e

PID

D i s s e m i nato r s

Web-book

SystemMetadata

D atas tr e am s

TEI f i l e

p a g e 1 i m a g e f i l e

p a g e 2 i m a g e f i l e

p a g e 3 i m a g e f i l e

Photo-seek

Page 19: Sandy Payette Cornell Information Science

Virginia Prototype Virginia Prototype

Content Models and Fedora Demos

Page 20: Sandy Payette Cornell Information Science

(Mycenae image example)

General Image Content ModelGeneral Image Content Model

Metadata

Persistent ID (PID)Disseminators

Disseminator BehaviorDefinition

BehaviorMechanism

web_image1 web_image web_image1

get_thumb HTTP GET

get_med imagedisplay.java

get_high HTTP GET

get_veryhigh HTTP GET

web_default_image web_default web_default_image

get_as_page imagedisplay.java

get_in_context HTTP GET (thumb)

SystemMetadataadmin Administrativemetadata

desc Descriptivemetadata

Datastreams

basis1 pointer to thumbnail size image

basis2 pointer to medium resolution image

basis3 pointer to high resolution image

basis4 pointer to highest resolution image

Page 21: Sandy Payette Cornell Information Science

(Pavilion III image example)

MrSID Image Content ModelMrSID Image Content Model

Metadata

Persistent ID (PID)

DisseminatorsDisseminator Behavior

DefinitionBehavior

Mechanism

web_image_mrsid web_image web_image_mrsid

get_thumb get_image.pl

get_med get_image.pl

get_high get_image.pl

get_veryhigh get_image.pl

web_default_image web_default web_default_image

get_as_page get_image.pl

get_in_context get_image.pl

System Metadataadmin Administrativemetadata

desc Descriptive metadata

Datastreams

basis1 pointer to MrSID formatted image

Page 22: Sandy Payette Cornell Information Science

(Finding Aid example)

Finding Aid Content ModelFinding Aid Content Model

Persistent ID (PID)

DisseminatorsDisseminator

BehaviorDefinition

BehaviorMechanism

web_ead1 web_ead web_ead1

get_web_default eaddoc.java

get_tp tp.xsl

get_admin admin.xsl

get_summary summary.xsl

get_scopecontent scopecontent.xsl

get_bioghist bioghist.xsl

get_component component.xsl

get_arrangement arrangement.xsl

get_organization organization.xsl

get_document document.xsl

get_menu menu.xsl

web_default_ead1 web_default web_default_ead1

get_as_page eaddoc.java

get_in_context document.xsl

System Metadataadmin Administrative metadata

desc Descriptive metadata

Datastreams

basis1 pointer to XML Finding Aid source

Page 23: Sandy Payette Cornell Information Science

(TEI letter example)

TEI Letter Content ModelTEI Letter Content Model

Metadata

Persistent ID (PID)

DisseminatorsDisseminator

BehaviorDefinition

BehaviorMechanism

web_teiletter1 web_teiletter web_teiletter1

get_teiletter_default teiletterdoc.pl

get_original letter.header.xsl

get_modern modern.xsl

get_teiheader teiheader.xsl

get_pageimages pageimages.xsl

web_default_teiletter web_default web_default_teiletter

get_as_page teiletterdoc.pl

get_in_context letter.header.xsl

System Metadataadmin Administrativemetadata

desc Descriptive metadata

DatastreamsDatastream(s)

basis1 pointer to XML TEI letter source

Page 24: Sandy Payette Cornell Information Science

(TEI book example)

TEI Book Content ModelTEI Book Content Model

Metadata

Persistent ID (PID)

DisseminatorsDisseminator Behavior

DefinitionBehavior

Mechanism

web_teibook1 web_teibook web_teibook1

get_web_default teidoc.java

get_teiheader admin.xsl

get_toc contents.xsl

get_menu_teibook menu.xsl

get_tp_teibook tp.xsl

get_id id.xsl

web_default_teibook web_default web_default_teibook

get_as_page teidoc.java

get_in_context contents.xsl

System Metadataadmin Administrativemetadata

desc Descriptivemetadata

Datastreams

basis1 pointer toXML TEI book source

Page 25: Sandy Payette Cornell Information Science

(Mycenae example)

GDMS Content ModelGDMS Content Model

(lawn example)

Metadata

Persistent ID (PID)

DisseminatorsDisseminator Behavior

DefinitionBehavior

Mechanism

web_gdms2 web_gdms web_gdms2

get_web_default imagedef.java

get_gdmswalk gdmswalk.xsl

get_menu imagemenu.xsl

web_default_gdms web_default web_default_gdms

get_as_page imagedef.java

get_in_context HTTP GET

System Metadataadmin Administrativemetadata

desc Descriptive metadata

DatastreamsDatastream

basis1 pointer to XML GDMS source file

Page 26: Sandy Payette Cornell Information Science

(ICPSR survey example)

Numerical Data Content ModelNumerical Data Content Model

M etad ata

Persistent ID (PID)

DisseminatorsDisseminator

BehaviorDefinition

BehaviorMechanism

web_ic psr1 web_ic psr web_ic psr1

g e t_web _d e fau lt loader.pl

get_abstrac t abstrac t.xsl

get_c itation c itation.xsl

get_details tec hnic al.xsl

get_question variables.xsl

get_subset c odebook.pl

get_study ftpstudy .pl

web_default_ic psr1 web_default web_default_ ic ps r1

get_as_page loader.pl

get_in_c ontext abstrac t.xsl

System M etadataadmin Administrative metadata

desc Desc riptive metadata

BasisDatastream(s)

basis1 XML Codebook sourc e

basis2(TBD) pointer to SQL Database c ontaining data

Page 27: Sandy Payette Cornell Information Science

The New FEDORAThe New FEDORA

Technical Specifications – Part I

Page 28: Sandy Payette Cornell Information Science

Background MaterialBackground Material

Overview of Web Service Technologies

Page 29: Sandy Payette Cornell Information Science

What is a Web Service?What is a Web Service?

A distributed application that runs over the internet.

An addressable network endpoint which receives structured messages returns structured responses.

A web application that publishes an open interface through which clients can send requests and received responses.

Page 30: Sandy Payette Cornell Information Science

How is this different from plain How is this different from plain old web applications?old web applications?

Formally defined API (application programming interface) defines a set of abstract operations for a web service

Published bindings for client to run operations Standard protocol for invoking operations on the

service. XML as standard means of encoding service

requests and responses.

Page 31: Sandy Payette Cornell Information Science

Why are Web Services important?Why are Web Services important? Interoperability

– Web applications can interact and build upon each other– Data is transferred in an interoperable manner (e.g., over

HTTP)– Data is encoded in an interoperable format (XML)

Works in decentralized, distributed, operating-system independent environment.

Standards-oriented Means to expose complex operations with rich data

typing (via XML Schema language typing) Ease of integrating distributed systems via the Web W3C effort to develop this service architecture

Page 32: Sandy Payette Cornell Information Science

How are Web Services How are Web Services Implemented?Implemented?

The Simple Object Access Protocol (SOAP) Approach– SOAP is a messaging protocol that can run over different

transport protocols (e.g., HTTP, SMTP)– Operation oriented (send a request to a end point)– Like CORBA, RMI, DCOM…but for Web and simpler– Application APIs can be defined and published using the Web

Service Description Language (WSDL)– Requests and responses sent as XML messages– Supports simple and complex data typing in requests and

responses– Supports transmission of binary data within requests or

response packages

Page 33: Sandy Payette Cornell Information Science

How are Web Services How are Web Services Implemented?Implemented?

The REST (Representational State Transfer) Approach– URI + HTTP + XML– URI/resource driven; message built into a URI (URL)– HTTP GET or POST– Response is XML data

– Issues: Not a standard, but a style of doing web apps; arguably it just gives a

fancy name to how lots of people do applications on the web by default; nothing really new here; just argues to do things the way we have been, maybe a little more standard by using XML.

Fragile service definition – URL’s change No data typing on requests Limited ability to transmit complex requests on URL W3C behind SOAP, but only one strong voice out there for REST

(Prescod).

Page 34: Sandy Payette Cornell Information Science

Example of Web Service using SOAPExample of Web Service using SOAP

My Application

SO

AP

/HT

TPS

OA

P/H

TT

P

GoogleWeb

Service

SOAP Request (XML)

SOAP Response (XML)

doSpellingSuggestion(payet)

payette

Page 35: Sandy Payette Cornell Information Science

XML SOAP RequestXML SOAP Request

<?xml version="1.0" encoding="UTF-8"?>

SOAP-ENV:Envelope xmlns:SOAP-ENV=http://schemas.xmlsoap.org/soap/envelope/ xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance xmlns:xsd="http://www.w3.org/1999/XMLSchema">

<SOAP-ENV:Body>

<m:doSpellingSuggestion xmlns:m="urn:GoogleSearch">

<key>/e325JlNPASJu</key>

<phrase>payet</phrase>

</m:doSpellingSuggestion>

</SOAP-ENV:Body>

</SOAP-ENV:Envelope>

Page 36: Sandy Payette Cornell Information Science

<?xml version="1.0" encoding="UTF-8"?><SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance" xmlns:xsd="http://www.w3.org/1999/XMLSchema">

<SOAP-ENV:Body><ns1:doSpellingSuggestionResponse xmlns:ns1="urn:GoogleSearch"

SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">

<return xsi:type="xsd:string">payette</return>

</ns1:doSpellingSuggestionResponse> </SOAP-ENV:Body>

</SOAP-ENV:Envelope>

XML SOAP ResponseXML SOAP Response

Page 37: Sandy Payette Cornell Information Science

New Fedora: Key FeaturesNew Fedora: Key Features

Repository system exposed as two related Web services– described using WSDL– both SOAP and HTTP bindings

Digital objects encoded and stored as XML using Metadata Encoding and Transmission Standard (METS)

Digital object behaviors implemented as linkages to distributed web services (also described using WSDL)

Digital objects support versioning of both content and services.

Page 38: Sandy Payette Cornell Information Science

XMLdigital

objects M anagedContent

Datastre am s

ExternalContent

Datastream s

SQLdigitalobjectcache

Fedora W eb Serv ice Layer

API-MManagement

Interface

API-AAccess

Interface

Data Store Layer

W eb brow sers

Core Sub-System Im plem entations

Custom Clients

New Fedora SystemNew Fedora System

Page 39: Sandy Payette Cornell Information Science

Web Service Communication ViewWeb Service Communication View

HT T P

A cces s S erv ice(A P I-A )

S O A PHT T P

M an ag em en t S erv ice(A P I-M )

S O A Pht

tp

smtp

othe

r

http

smtp

othe

r

http

http

B a tch In ges t C lien t W eb Bro w serM an agem en t C lien t

S O A P

http

S O A P

http

HT T P

http

T ra ns p o rt P ro to c o l La y e r

M e s s a ge P ro to c o l La y e r

C ore S u b -S ys tem Im p lem en tation s

X M L F iles

R e la tio n a l D B

D igita l O b je c t S to ra ge

HT

TP h ttp

R e m o teB e ha v io r

M e c ha nis mS e rv ic e

HT

TP

h ttp

E xte rna lC o nte nt S o u rc e

E xte rna lC o nte nt S o u rc e

http ftp

M an aged C o n ten tExte rn a l C o n ten t

R e tr iev e r

http ftp

D a ta s tre a m S to ra ge

SO

AP

h ttp

sm tp

o th er

R e m o teB e ha v io r

M e c ha nis mS e rv ic e

SOA

P

h ttp

A ccess C lien t

S O A P

http

Page 40: Sandy Payette Cornell Information Science

The New FEDORAThe New FEDORA

Encoding Digital Objects in XML

Page 41: Sandy Payette Cornell Information Science

Metadata Encoding and Transmission Metadata Encoding and Transmission Standard (METS)Standard (METS)

XML “standard” for encoding descriptive, administrative, and structural metadata of digital library objects

Developed under auspices of the Digital Library Federation

METS standard maintained by the Network Development and MARC Standards Office of the Library of Congress

http://www.loc.gov/standards/mets/

Page 42: Sandy Payette Cornell Information Science

METS SchemaMETS Schema METS is written in the XML Schema Language METS defines four sections for an object

– Descriptive metadata– Administrative metadata– File group– Structure map

METS goals include:– Facilitate management of objects within a repository – Provide a standard format for exchange of objects between

repositories – Provide standard format for transmission of objects to users for

rendering (via tools or applications)

Page 43: Sandy Payette Cornell Information Science

Mapping Fedora to METSMapping Fedora to METSFedora METS

Persistent Identifier (PID)

<METS:mets OBJID=“uva-lib:1225”/>

Datastreams <METS:fileGrp ID=“DATASTREAMS”>

<METS:fileGrp ID=“DS1” STATUS=“A”>

<!– Version 2: High resolution image   -->

<METS:file ID="DS1.1" CREATED="2002-05-20T06:32:00“ MIMETYPE="image/jgp"

<METS:Flocat LOCTYPE=“URL" xlink:href=“http://uva.edu/img8a.jpg"/>

<METS:file/>

<!– Version 1: High resolution image   -->

<METS:file ID="DS1.0" CREATED="2002-05-10T02:32:00“ MIMETYPE="image/jgp"

<METS:Flocat LOCTYPE=“URL" xlink:href=“http://uva.edu/img8a.jpg "/>

<METS:file/>

</METS:fileGrp>

</METS:fileGrp>

Page 44: Sandy Payette Cornell Information Science

Mapping Fedora to METSMapping Fedora to METSFedora METS

System

Metadata

<METS:dmdSec/>

<METS:amdSec/>

Disseminator <METS:behaviorSec ID=“DISS1” STATUS=“A” STRUCTID=“S1”>

<METS:mechanism LOCTYPE="URN" xlink:href=“uva-bmech:12"/>

<METS:interfaceDef LOCTYPE="URN" xlink:href=“uva-bdef:8"/>

</METS:behaviorSec>

<METS:structMap TYPE=“fedora:dsBindingMap” ID=“S1”>

<METS:div TYPE=“uva-bmech:12”>

<METS:div TYPE=“IMAGE-HIGH” ORDER=“0”/>

<METS:fptr FILEID=“DS1" />

<METS:div/> <METS:div/>

</METS:structMap>

Page 45: Sandy Payette Cornell Information Science

Digital Object VersioningDigital Object Versioning

Versioning within Data Objects– Datastream versioning

Date/time stamped New version every time datastream is modified

– Disseminator versioning Date/time stamped New version if disseminator is modified to reference a

different Behavior Mechanism (“better mousetrap”)

Versioning within Behavior Definition and Mechanism Objects– New versions of WSDL metadata recorded in these

objects (with date/time stamps) – This deserves much more explanation that this slide can

offer!

Page 46: Sandy Payette Cornell Information Science

METS : Sample Fedora ObjectMETS : Sample Fedora Object

Click here for image digital object

Page 47: Sandy Payette Cornell Information Science

Fedora Dissemination Fedora Dissemination DatabaseDatabase

Alternate form of object storage that will act as a cache of most recent versions of digital objects

Ensure high-performance access (disseminations) Repository system replicates from authoritative

XML version of objects to relational database Plan to phase-out the database in Phase 2-3:

– Access sub-system to work completely off the XML storage, as XML tools improve performance-wise.

– Pursue different caching strategies as necessary

Page 48: Sandy Payette Cornell Information Science

The New FEDORAThe New FEDORA

Repository System Design

Page 49: Sandy Payette Cornell Information Science

Fedora Repository SystemFedora Repository System

D isse m in a tio n

H T T P

A PI -A F ed o r a -A PI-A .w s d l

S O A PH T T P

A PI -M F ed o r a -A PI-M .w s d l

S O A P

http

smtp

othe

r

http

smtp

othe

r

http

http

E x ter n a lC o n ten tS o u rc e

E x ter n a lC o n ten tS o u rc e

http

ftp

E x ter n a l C o n ten tR etr iev er

X M L F ile s

R elat io n al D B

Batc h I n g es t C lien t W eb Br o w s erM an ag em en tC lien t

C o m p o n e n tM a n a g e m e n t

O b j e c tR e f le c tio n

S O A P

http

S O A P

http

H T T P

http

S es s io n M an ag em en t S u b s y s tem- U s er A u th en t icat io n

P o l icies

O b j e c tV a lid a tio n s v c1

s v c2

T ra ns port P ro toc ol

M e s s a ge P rotoc ol

O b j e c tM a n a g e m e n t

P I DG e n e r a tio n

P o lic yM a n a g e m e n t

U s ers /G ro u p s

h t tp

ftp

L o calS erv ices

D atas tr eam S to r ag e

D ig ita l O b jec t S to r ag eS to rag e S u b s ys te m

M an ag e m e n tS u b s ys te m

S e c u rityS u b s ys te m

A c c e s sS u b s ys te m

F e do r aW e b Se r vi c eE xpo s ur eL aye r

C l i e nts

M an ag edC o n ten t

HT

TP

h t tp

SO

AP

h t tp

s m tp

o th er

R em o teBeh av io r

M ec h an is mS er v ic e

httphttp

SO

AP

R em o teBeh av io r

M ec h an is mS er v ic e

HT

TP

Page 50: Sandy Payette Cornell Information Science

FEDORA Web Service FEDORA Web Service API DefinitionsAPI Definitions

“API-M” – interface for management sub-system– Operations necessary to create and maintain objects and

their components– Interface directly with authoritative XML version of object

“API-A” – interface for access sub-system– Operations necessary for clients to perform disseminations

on objects in the repository– No direct access to object internal structure or components– Will work against cached representation of object to

optimize performance.

Page 51: Sandy Payette Cornell Information Science

Fedora Management Sub-SystemFedora Management Sub-System Implements API-MImplements API-M

Object ManagementObject Component ManagementObject ValidationPID GenerationInteracts with Storage Subsystem

Page 52: Sandy Payette Cornell Information Science

Other Sub-systemsOther Sub-systems Storage Sub-system

– Responsible for all matters pertaining to reading and writing objects from persistent storage

– Modular design – can configure different object readers and writers to suit the context.

– Modular design – can configure different data store strategies (in phase 1 will have file system and relational database)

Security Sub-system – Store access control policies for repository and objects– Store user and group information– Enforcement of policies

Page 53: Sandy Payette Cornell Information Science

Security Sub-systemSecurity Sub-systemAccess Control PoliciesAccess Control Policies

General Purpose– “Only repository managers can add new

disseminators to digital objects in the repository.”

Object-Specific (e.g., Lecture object) – “Guests may view course syllabus and slides 1-10

of Lecture 1, but may not view the lecture video or any other slides.”

– “Students may not view Lecture 2 video unless they submit assignment for Lecture 1.”

See research at: http://www.cs.cornell.edu/payette/prism/security/policy.htm

Page 54: Sandy Payette Cornell Information Science

Fedora Repository SystemFedora Repository System

D isse m in a tio n

H T T P

A PI -A F ed o r a -A PI-A .w s d l

S O A PH T T P

A PI -M F ed o r a -A PI-M .w s d l

S O A P

http

smtp

othe

r

http

smtp

othe

r

http

http

E x ter n a lC o n ten tS o u rc e

E x ter n a lC o n ten tS o u rc e

http

ftp

E x ter n a l C o n ten tR etr iev er

X M L F ile s

R elat io n al D B

Batc h I n g es t C lien t W eb Br o w s erM an ag em en tC lien t

C o m p o n e n tM a n a g e m e n t

O b j e c tR e f le c tio n

S O A P

http

S O A P

http

H T T P

http

S es s io n M an ag em en t S u b s y s tem- U s er A u th en t icat io n

P o l icies

O b j e c tV a lid a tio n s v c1

s v c2

T ra ns port P ro toc ol

M e s s a ge P rotoc ol

O b j e c tM a n a g e m e n t

P I DG e n e r a tio n

P o lic yM a n a g e m e n t

U s ers /G ro u p s

h t tp

ftp

L o calS erv ices

D atas tr eam S to r ag e

D ig ita l O b jec t S to r ag eS to rag e S u b s ys te m

M an ag e m e n tS u b s ys te m

S e c u rityS u b s ys te m

A c c e s sS u b s ys te m

F e do r aW e b Se r vi c eE xpo s ur eL aye r

C l i e nts

M an ag edC o n ten t

HT

TP

h t tp

SO

AP

h t tp

s m tp

o th er

R em o teBeh av io r

M ec h an is mS er v ic e

httphttp

SO

AP

R em o teBeh av io r

M ec h an is mS er v ic e

HT

TP

Page 55: Sandy Payette Cornell Information Science

Fedora Access Sub-SystemFedora Access Sub-System Implements API-AImplements API-A

Object Reflection– Identify the types of Behavior Definitions to which an object

subscribes (via the object’s Disseminators)– Reflect on a Behavior Definition to identify the kinds of

disseminations that can be run on the object (i.e,. as method requests)

Dissemination– Fulfills requests for particular methods (i.e., of a Behavior

Definition) to be run on an object– Mediates access to supporting services (i.e., Behavior

Mechanisms) used to present or transform datastreams of the object

– Returns a view of the object’s content to client

Page 56: Sandy Payette Cornell Information Science

API-A: Object Reflection RequestsAPI-A: Object Reflection RequestsIdentify Types of Behavior DefinitionsIdentify Types of Behavior Definitions

Each Disseminator is said to “subscribe” to a Behavior Definition

It does this by referencing the PID of a particular Behavior Definition Object.

Each Behavior Definition Object contains metadata that describes a set of related behaviors (or operations)

Via API-A, clients can send a service request to determine what Behavior Definitions an object subscribes to.

Page 57: Sandy Payette Cornell Information Science

API-A: Object Reflection RequestAPI-A: Object Reflection RequestGet Behavior MethodsGet Behavior Methods

Each Disseminator has a Behavior Definition Object associated with it.

Each Disseminator has a Behavior Mechanism Object associated with it that describes how to bind to a particular service that complies with the Disseminator’s Behavior Definition.

Via API-A, clients can send a service request to obtain the list of method definitions associated with a particular Disseminator of the digital object.

Page 58: Sandy Payette Cornell Information Science

API-A: Object Reflection RequestsAPI-A: Object Reflection Requests

Web-default, Web-image, Admin

get-as-page; get-in-context

MrSID Image Object

Web-default

Web-image

Admin

SystemMetadata

Basis(MrSID-encoded

image file)

Repository

AP

I-A

GetBehaviorDefinitions?PID=101PID = 101

GetBehaviorMethods?PID=101&BID=Web-default

Page 59: Sandy Payette Cornell Information Science

API-A: Dissemination RequestAPI-A: Dissemination Request

Clients can obtain content from a digital object with minimal knowledge about the object.

Behavior Definition identifiers and method definitions are the basis for making dissemination requests on digital objects

Client’s do not need to know particulars of how to attach to the service (Behavior Mechanism) that is operating on its behalf.

A dissemination request requires just three things:– Digital Object Identifier (PID)– Behavior Definition Identifier (BID)– Method name (and optional parameters) for a behavior

Page 60: Sandy Payette Cornell Information Science

API-A: Dissemination RequestAPI-A: Dissemination Request

Digital Object: 101

Image of bird

Bird Digital Library1

White Birds: Image 1 Image 2 Image 3

GetDissemination?PID=101&BID=Web-default &method=get-as-page

MrSID Image Object

Web-default

Web-image

Admin

SystemMetadata

Basis(MrSID-encoded

image file)

RepositoryA

PI-

A

Page 61: Sandy Payette Cornell Information Science

DisseminationsDisseminationsBenefitsBenefits

Simple access: dissemination requests shield clients from the internal structure of digital objects

Stable interface: dissemination requests are like requests against an abstract interface in that they are not tied to object implementation details that may change over time (e.g., storage locations of datastreams)

Foster Interoperability: different digital objects can vary in both the format of content and how it is structured, yet we can access them in a consistent manner via disseminations.

Page 62: Sandy Payette Cornell Information Science

The New FEDORAThe New FEDORA

Software Deployment

Page 63: Sandy Payette Cornell Information Science

Fedora Software Deployment Fedora Software Deployment GoalsGoals

An efficient, scalable, freely distributable FEDORA repository system ASAP

Make all software open source A complete basic management and access

interfaces with the initial release Add other important digital library functionality in

later releases Create multiple testbed repositories to deploy and

evaluate the software Interoperability testing, including sharing of

content and mechanisms among deployment partner repositories.

Page 64: Sandy Payette Cornell Information Science

Deployment GroupDeployment Group

Indiana University: Digital Library group NYU: Humanities Computing group Tufts: Digital Collections and Archives Department Kings College London: Humanities Computing Oxford: Oxford Digital Library and The Refugee Studies

Center Library of Congress: Motion Picture and Recorded Sound

Division Northwestern University: library/academic computing Los Alamos National Laboratory: Research Library

Page 65: Sandy Payette Cornell Information Science

Fedora Project PlanFedora Project Plan Phase 1: (pre-release Oct 31, 2002; final Jan 2003)

– Repository system with management and access subsystems exposed as web services

– Storage subsystem with XML object store and replication to relational database cache

– Object builder tools (GUI and batch)– Basic set of behavior services

Phase 2: Add more production support– Security and policy enforcement– Additional management tools– Optimize performance for accessing XML objects– Object versioning– Collection objects– Advanced disk management

Phase 3: Enhance end-user support– New kinds of disseminators, with supporting behavior services– Efficiency and scale optimization

Page 66: Sandy Payette Cornell Information Science

FEDORA Web Site:www.fedora.info

Page 67: Sandy Payette Cornell Information Science

Questions and DiscussionQuestions and Discussion