37
ERA Technology and ERA Technology and Development Strategy Development Strategy ( ( System Architecture Development System Architecture Development ) ) Meg Phillips and Quyen Nguyen Meg Phillips and Quyen Nguyen ACERA - April 2011 ACERA 2011 ACERA 2011 April 6, 2011

ERA Technology and Development Strategy (System Architecture Development) Meg Phillips and Quyen Nguyen ACERA - April 2011 ACERA 2011 April 6, 2011

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

ERA Technology and Development ERA Technology and Development StrategyStrategy

((System Architecture DevelopmentSystem Architecture Development))

Meg Phillips and Quyen NguyenMeg Phillips and Quyen Nguyen

ACERA - April 2011

ACERA 2011ACERA 2011

April 6, 2011

AgendaAgenda

• System Requirements

• Design Approach

• System Architecture

• Related Work

• Conclusion

ACERA - April 2011

2

3

System RequirementsSystem Requirements

• Extensibility: record types, data types, and services could be added without extensive redesign.• EvolvabilityEvolvability: new technologies could be inserted using standards APIs and interfaces.•Availability: key functions must be highly available.•Scalability : adapt to record volume and user community growth.•Security: protection of system and its assets. •User Friendly: browser interface, intuitive, 508 compliance.

ACERA - April 2011

Design ApproachDesign Approach• Develop ERA Reference Architecture

– Correct deficiencies in I1

– Architecture Tool to guide current and future design and development starting I3

• Goal is to build a Robust Platform– to develop, add and enhance services and applications

– Adaptive to changes, especially business rules

– Foundation for Preservation and the Access framework, whose components evolve at different pace

• Fast pace for latter due to Internet, Web 2.0, Social Media.

• Slow pace for former

• Standard Interface is key– Open standards from Presentation to Backend layers

– Domain standards such as OAIS (Open Archive Information System) and PREMIS (PREservation Metadata Implementation Strategies)

• Data-minded and Security-minded

ACERA - April 2011

4

Design Approach: Reference ArchitectureDesign Approach: Reference Architecture

• Facilitate system evolution to new technologies such as Cloud Computing, Web 2.0 Social Media, and future technologies.

– Long term survivability of system

– Take advantage of new technologies: potential reduction of lifecycle cost.

– Follow federal mandate and better serve public (e.g. Open Gov.)

• Reference Architecture helps us leveraging Community Support

– Very important due to some uncharted territory

– Take advantage of community expertise

– Reduce development cost

• Well-defined system interfaces

• Well-defined Data and Metadata Model

• Publish Reference Architecture

ACERA - April 2011

5

System Architecture: Three PillarsSystem Architecture: Three Pillars

ACERA - April 2011

Evolvable System

Architecture

6

System Architecture : SOA ParadigmSystem Architecture : SOA Paradigm

ACERA - April 2011

Evolvable System

Architecture

7

OAIS Reference ModelOAIS Reference Model

Ingest

Data Management

Archival StorageAccess

Preservation Planning

Administration

MANAGEMENT

Queries

Results

PRODUCER

CONSUMER

SIP

AIP AIPDIP

Description

OAIS

ACERA - April 2011

8

Designing Services [mesoa 2009]Designing Services [mesoa 2009]

• Service-Oriented Architecture (SOA) Paradigm:

– Services

– Enterprise Service Bus (ESB)

• Starting from OAIS model, design Business Services:

– Ingest

– Preservation

– Access

• Design lower level services to support those Business Services

– Tool Services: Virus Scan, File Format Identification, etc.

– Common Services: Logging, Authorization, etc.

• Composition of low-level services into business services made possible by ESB with standards-based middleware

• Flexibility and extensibility: add and replace services

ACERA - April 2011

9

Ingest ProcessIngest Process

ACERA - April 2011

• Evolvable Architecture allows integration with various file identification tools: DROID, Jhove, JAI, pCOS, etc.

• Web Services made out of Tools (COTS, FOSS)

• Old tools can be replaced by new tools.

• New tools can be added.

• Capability allows the system to leverage open software developed by the digital library and archiving community.

10

Preservation: Transformation ProcessPreservation: Transformation Process

ACERA - April 2011

• Evolvable Architecture allows integration with various transformation tools depending on the file types.

• Tools are web services

• For the same file type, a new transformation tool with better conversion can be added.

• For a new file type, a new transformation tool can also be added and used.

11

Preservation: Future Choice of Strategy Preservation: Future Choice of Strategy

ACERA - April 2011

• Evolvable Architecture allows adding a new branch for a new preservation strategy

12

System Architecture: Metadata ModelSystem Architecture: Metadata Model

ACERA - April 2011

Evolvable System

Architecture

13

ACE: MotivationACE: Motivation

McClellan.tiff McClellan.jpg

TIFFTIFF JPEGJPEG

Digital Master Version Online Access Version

MS Word .D

oc

MS Word .D

oc

Original Version

ERA Transformation

Tool

ERA Transformation

Tool

Preservation Version

ERA needs the capability to create and manage different versions of an electronic record, and relate them to a single logical entity: preservation, redaction

Image of Gen. George B. McClellan

Memorandum

14ACERA - April 2011

PREMIS-based ACEPREMIS-based ACE

ACERA - April 2011

15

ACE StructureACE Structure

• Multiplicity of Representations and Objects

• Usage

• Preservation transformation

• Redaction

• Relationships

– With Business Objects

– Between representations

– Between Objects

• Multiple pages of a digitized record

• Extensible implementation which could be used in future for:

– Archival Description

– Technical Metadata of Digitized Materials

ACERA - April 2011

16

Archival Asset Package [nist 2010]Archival Asset Package [nist 2010]

ACERA - April 2011 17

• Adherence to Archival Information Package (AIP) in OAIS

• Self-contained digital object

• Data model used to Import & Export between services and systems

System Architecture: Content ServerSystem Architecture: Content Server

ACERA - April 2011

Evolvable System

Architecture

18

Content Server within OAIS ModelContent Server within OAIS Model

Ingest

Data Management

Archival StorageAccess

Preservation Planning

Administration

MANAGEMENT

Queries

Results

PRODUCER

CONSUMER

SIP

AIP AIPDIP

Description

OAIS

Content Server

ACERA - April 2011

19

Content Server [syscon 2010]Content Server [syscon 2010]

A Content Server is a logical construct to store and manage both data and metadata encapsulated in an Archival Asset Package (AAP):

– Insert, Retrieve, Update, Delete and Search

Expose a simple interface

Hide specific implementation of underlying storage management system

Allow the system to have various technologies

System can evolve to new technologies

Allocation Policy can be based on business needs and requirements

Different data collections: Federal, Presidential, Legislative, Census

Security and access control considerations

20ACERA - April 2011

Related WorkRelated Work

• Survey of system architecture designed and developed for digital preservations and archives

• Validation of our approach

• Evolvability, extensibility and pluggability of services achieved by SOA

– Planets project funded by the European Union

– National Library of Australia

– Portuguese National Archives RODA (Repository of Authentic Digital Objects), etc.

• Content Server similar to Content Manager used in the system of the Royal Dutch Library based on IBM Digital Information Archiving

System (DIAS).

ACERA - April 2011

21

Conclusion: SummaryConclusion: Summary

• ERA Reference Architecture is evolvable and extensible thanks to the synergy of the three pillars:– SOA Paradigm– Metadata Model– Content Server Concept

• Based on open standards: OAIS, PREMIS, XML, Web Services

• Implemented in the I3 release

• Benefits seen in Option Year 5– Ease of modifying, and branching existing workflow– Reuse of underlying services– Facilitate development of Preservation Transformation framework– Creation of Transformation Strategy and Job Definition based on

XForms and workflow middleware

• Positioned to take advantage of software tools and components developed by the digital preservation community

22ACERA - April 2011

Conclusion: Future DirectionConclusion: Future Direction

• NARA Internal Community– Externalize architectural components such as ESB and Web

Services to promote reuse.– Publish well-defined system interfaces– Publish well-defined Data and Metadata Model

• Federal Agencies– Share and learn experience with other agencies such as

LOC, GPO, NASA, and others.

• Larger Community– Collaboration with other archives and digital libraries– Collaboration with Research community for Ingest,

Preservation and Access functionalities.– Identify areas of possible usage of Free Open Source

Software

23ACERA - April 2011

PublicationsPublications

[syscon 2010] Quyen L. Nguyen, Alla Lake and Mark Huber. “Evolvable and Scalable System of Content Servers for a Large Digital Preservation Archives”. Proceedings of 4th Annual IEEE Systems Conference , April 5-8, 2010, San Diego.

[nist 2010] Quyen L. Nguyen and Dyung Le. “Archival Asset Package Design Concept for an OAIS System”. Proceedings of US Workshop Roadmap development for Digital Preservation Interoperability Framework (DPIF). NIST, Gaithersburg, Maryland, March 29-31, 2010.

[mesoa 2009] Quyen L. Nguyen. “Towards a Design Approach for an Effective System Evolution of a Large Electronic Archive Information System”. Proceedings of 3rd International Workshop on a Research Agenda for Maintenance and Evolution of Service-Oriented Systems, September 20-26, 2009, Edmonton.

[balisage 2009] Quyen L. Nguyen and Betty Harvey. “Agile Business Objects Management Application for Electronic Records Archive Transfer Process”. Proceedings of Balisage, the Markup Conference 2009, Aug 11-14 2009, Montreal.

24ACERA - April 2011

ReferencesReferences

1. The Consultative Committee for Space Data Systems. “Reference Model for an Open Archival Information System (OAIS)”, 2002.

2. Preservation Metadata: Implementation Strategies (PREMIS). http://www.loc.gov/standards/premis/.

3. Robert Kahn and Robert Wilensky. “A Framework for Distributed Digital Objects”. International Journal on Digital Libraries (2006) 6(2): 115–123.

4. Adam Farquhar and Helen Hockx-Yu. “Planets: Integrated Services for Digital Preservation”. The International Journal of Digital Curation, Issue 2, Volume 2 | 2007.

5. IBM DIAS for The Royal Dutch Library. http://www-935.ibm.com/services/nl/dias/ref/references.html.

6. National Library of Australia. http://www.nla.gov.au/dsp/documents/itag.pdf

7. Jose Carlos Ramalho et al. “RODA and Crib – a Service-Oriented Digital Repository”. http://repositorium.sdum.uminho.pt/bitstream/1822/8226/1/RodaAndCrib.pdf.

25ACERA - April 2011

26

Thank You!Thank You!

Meg PhillipsMeg [email protected]@nara.gov

Quyen NguyenQuyen [email protected]@nara.gov

ACERA - April 2011

Backup SlidesBackup Slides

ACERA - April 2011

Evolution Relativity [balisage 2009]Evolution Relativity [balisage 2009]

28

Upper timeline shows evolution of the system itself. Lower timeline shows evolution of external systems that created

to-be archived data. Note the lags between the two timelines (several years). Challenge: evolving itself to use current technologies of epoch

Ta in order to provide long-term access to data born out of technologies at Tc time.

High-level Architecture RoadmapHigh-level Architecture Roadmap

29ACERA - April 2011

In Place

In Progress

Future

Planets Interoperability FrameworkPlanets Interoperability Framework

• Planets’ core components:

– Service Bus, and workflow

– security, monitoring, transaction manager, etc.

• Evolvability and extensibility: allow plugging of third-party services

ACERA - April 2011

The Royal Dutch Library The Royal Dutch Library

31

• Based on IBM’s Digital Information Archiving System (DIAS)

• Core component is Content Manager to store and manage both data and metadata

– Library Server:

• Cataloging and indexing of metadata

• Facilitate search and retrieval

• Security Control for access

– Object Server

• Store actual digital objects

ACERA - April 2011

Physical Implementation of AAPPhysical Implementation of AAP

ACERA - April 2011

32

• ZIP and URL options for encapsulating files.

N-Part IdentifierN-Part Identifier

33

Uniqueness

– Within ERA; Can be integrated with current and future standard protocols such as Handle, DOI, PURL, etc. Allow access to different levels of the ACE structure Identifiers can be assigned in a decentralized system Example:

ID of an Electronic Asset & its Metadata: 1.1–6–200902.1 The N-part ID can be made globally unique by prefixing it with the ERA namespace.

For instance, if “era.nara.gov” is used, then the above ID becomes:

http://www.era.nara.gov/1.1-1-200902.1

ACERA - April 2011

Possible Support of HTTP ProtocolPossible Support of HTTP Protocol

34

ERAIDResolver:Resolves the 1st and 2nd

segments of ERA-ID

ERA-HTTPExample: http://era.gov/1.6-1-200903.123

Content Server IDExample: CS # 6

Resolver in Content Server:Resolves remainder of ERA-ID if necessary

Storage ID

Storage

Object Physical Location

HTTP Web Server:Receives HTTP request and

passes it to ERA ID Resolver.

ERA-ID

ACERA - April 2011

35

Query

Results

AAPERA

Storage

Metadata Management

CONTENT

SERVER

ERA Storage

CONTENT

SERVER

ERA Storage

CONTENT

SERVER

Data Management

AccessSubsyste

m.

AAP

FederatorRoute AIP

toCorrect CS

Query

Results

DIP

Federator: Federate

Query

PolicyConfig.

Config.Policy

AAP

QueryResults

Query

Query

AccessWorking Storage(AWS)

AAP

Metadata Management

Metadata Management

Business Object

Management

Description Management

Preservation

Standard Operations and Interfaces:•Put AAP•Get AAP•Update AAP•Delete AAP

Search: 1. Metadata 2. Asset

Federators Global & LocalFederators Global & Local

ACERA - April 2011

•If requested asset is in OPA’s local storage, just send out the asset to requestor.

•If requested asset with given URI is in ERA, then the request gets pooled and forwarded to ERA system, which will push the asset.

Potential Access of Records in ERA

36ACERA - April 2011

Potential Reuse of ServicesPotential Reuse of Services

37ACERA - April 2011• Cross-use of Services facilitated by ESB