View
213
Download
0
Tags:
Embed Size (px)
Citation preview
ERA Technology and Development ERA Technology and Development StrategyStrategy
((System Architecture DevelopmentSystem Architecture Development))
Meg Phillips and Quyen NguyenMeg Phillips and Quyen Nguyen
ACERA - April 2011
ACERA 2011ACERA 2011
April 6, 2011
AgendaAgenda
• System Requirements
• Design Approach
• System Architecture
• Related Work
• Conclusion
ACERA - April 2011
2
3
System RequirementsSystem Requirements
• Extensibility: record types, data types, and services could be added without extensive redesign.• EvolvabilityEvolvability: new technologies could be inserted using standards APIs and interfaces.•Availability: key functions must be highly available.•Scalability : adapt to record volume and user community growth.•Security: protection of system and its assets. •User Friendly: browser interface, intuitive, 508 compliance.
ACERA - April 2011
Design ApproachDesign Approach• Develop ERA Reference Architecture
– Correct deficiencies in I1
– Architecture Tool to guide current and future design and development starting I3
• Goal is to build a Robust Platform– to develop, add and enhance services and applications
– Adaptive to changes, especially business rules
– Foundation for Preservation and the Access framework, whose components evolve at different pace
• Fast pace for latter due to Internet, Web 2.0, Social Media.
• Slow pace for former
• Standard Interface is key– Open standards from Presentation to Backend layers
– Domain standards such as OAIS (Open Archive Information System) and PREMIS (PREservation Metadata Implementation Strategies)
• Data-minded and Security-minded
ACERA - April 2011
4
Design Approach: Reference ArchitectureDesign Approach: Reference Architecture
• Facilitate system evolution to new technologies such as Cloud Computing, Web 2.0 Social Media, and future technologies.
– Long term survivability of system
– Take advantage of new technologies: potential reduction of lifecycle cost.
– Follow federal mandate and better serve public (e.g. Open Gov.)
• Reference Architecture helps us leveraging Community Support
– Very important due to some uncharted territory
– Take advantage of community expertise
– Reduce development cost
• Well-defined system interfaces
• Well-defined Data and Metadata Model
• Publish Reference Architecture
ACERA - April 2011
5
System Architecture: Three PillarsSystem Architecture: Three Pillars
ACERA - April 2011
Evolvable System
Architecture
6
System Architecture : SOA ParadigmSystem Architecture : SOA Paradigm
ACERA - April 2011
Evolvable System
Architecture
7
OAIS Reference ModelOAIS Reference Model
Ingest
Data Management
Archival StorageAccess
Preservation Planning
Administration
MANAGEMENT
Queries
Results
PRODUCER
CONSUMER
SIP
AIP AIPDIP
Description
OAIS
ACERA - April 2011
8
Designing Services [mesoa 2009]Designing Services [mesoa 2009]
• Service-Oriented Architecture (SOA) Paradigm:
– Services
– Enterprise Service Bus (ESB)
• Starting from OAIS model, design Business Services:
– Ingest
– Preservation
– Access
• Design lower level services to support those Business Services
– Tool Services: Virus Scan, File Format Identification, etc.
– Common Services: Logging, Authorization, etc.
• Composition of low-level services into business services made possible by ESB with standards-based middleware
• Flexibility and extensibility: add and replace services
ACERA - April 2011
9
Ingest ProcessIngest Process
ACERA - April 2011
• Evolvable Architecture allows integration with various file identification tools: DROID, Jhove, JAI, pCOS, etc.
• Web Services made out of Tools (COTS, FOSS)
• Old tools can be replaced by new tools.
• New tools can be added.
• Capability allows the system to leverage open software developed by the digital library and archiving community.
10
Preservation: Transformation ProcessPreservation: Transformation Process
ACERA - April 2011
• Evolvable Architecture allows integration with various transformation tools depending on the file types.
• Tools are web services
• For the same file type, a new transformation tool with better conversion can be added.
• For a new file type, a new transformation tool can also be added and used.
11
Preservation: Future Choice of Strategy Preservation: Future Choice of Strategy
ACERA - April 2011
• Evolvable Architecture allows adding a new branch for a new preservation strategy
12
System Architecture: Metadata ModelSystem Architecture: Metadata Model
ACERA - April 2011
Evolvable System
Architecture
13
ACE: MotivationACE: Motivation
McClellan.tiff McClellan.jpg
TIFFTIFF JPEGJPEG
Digital Master Version Online Access Version
MS Word .D
oc
MS Word .D
oc
Original Version
ERA Transformation
Tool
ERA Transformation
Tool
Preservation Version
ERA needs the capability to create and manage different versions of an electronic record, and relate them to a single logical entity: preservation, redaction
Image of Gen. George B. McClellan
Memorandum
14ACERA - April 2011
ACE StructureACE Structure
• Multiplicity of Representations and Objects
• Usage
• Preservation transformation
• Redaction
• Relationships
– With Business Objects
– Between representations
– Between Objects
• Multiple pages of a digitized record
• Extensible implementation which could be used in future for:
– Archival Description
– Technical Metadata of Digitized Materials
ACERA - April 2011
16
Archival Asset Package [nist 2010]Archival Asset Package [nist 2010]
ACERA - April 2011 17
• Adherence to Archival Information Package (AIP) in OAIS
• Self-contained digital object
• Data model used to Import & Export between services and systems
System Architecture: Content ServerSystem Architecture: Content Server
ACERA - April 2011
Evolvable System
Architecture
18
Content Server within OAIS ModelContent Server within OAIS Model
Ingest
Data Management
Archival StorageAccess
Preservation Planning
Administration
MANAGEMENT
Queries
Results
PRODUCER
CONSUMER
SIP
AIP AIPDIP
Description
OAIS
Content Server
ACERA - April 2011
19
Content Server [syscon 2010]Content Server [syscon 2010]
A Content Server is a logical construct to store and manage both data and metadata encapsulated in an Archival Asset Package (AAP):
– Insert, Retrieve, Update, Delete and Search
Expose a simple interface
Hide specific implementation of underlying storage management system
Allow the system to have various technologies
System can evolve to new technologies
Allocation Policy can be based on business needs and requirements
Different data collections: Federal, Presidential, Legislative, Census
Security and access control considerations
20ACERA - April 2011
Related WorkRelated Work
• Survey of system architecture designed and developed for digital preservations and archives
• Validation of our approach
• Evolvability, extensibility and pluggability of services achieved by SOA
– Planets project funded by the European Union
– National Library of Australia
– Portuguese National Archives RODA (Repository of Authentic Digital Objects), etc.
• Content Server similar to Content Manager used in the system of the Royal Dutch Library based on IBM Digital Information Archiving
System (DIAS).
ACERA - April 2011
21
Conclusion: SummaryConclusion: Summary
• ERA Reference Architecture is evolvable and extensible thanks to the synergy of the three pillars:– SOA Paradigm– Metadata Model– Content Server Concept
• Based on open standards: OAIS, PREMIS, XML, Web Services
• Implemented in the I3 release
• Benefits seen in Option Year 5– Ease of modifying, and branching existing workflow– Reuse of underlying services– Facilitate development of Preservation Transformation framework– Creation of Transformation Strategy and Job Definition based on
XForms and workflow middleware
• Positioned to take advantage of software tools and components developed by the digital preservation community
22ACERA - April 2011
Conclusion: Future DirectionConclusion: Future Direction
• NARA Internal Community– Externalize architectural components such as ESB and Web
Services to promote reuse.– Publish well-defined system interfaces– Publish well-defined Data and Metadata Model
• Federal Agencies– Share and learn experience with other agencies such as
LOC, GPO, NASA, and others.
• Larger Community– Collaboration with other archives and digital libraries– Collaboration with Research community for Ingest,
Preservation and Access functionalities.– Identify areas of possible usage of Free Open Source
Software
23ACERA - April 2011
PublicationsPublications
[syscon 2010] Quyen L. Nguyen, Alla Lake and Mark Huber. “Evolvable and Scalable System of Content Servers for a Large Digital Preservation Archives”. Proceedings of 4th Annual IEEE Systems Conference , April 5-8, 2010, San Diego.
[nist 2010] Quyen L. Nguyen and Dyung Le. “Archival Asset Package Design Concept for an OAIS System”. Proceedings of US Workshop Roadmap development for Digital Preservation Interoperability Framework (DPIF). NIST, Gaithersburg, Maryland, March 29-31, 2010.
[mesoa 2009] Quyen L. Nguyen. “Towards a Design Approach for an Effective System Evolution of a Large Electronic Archive Information System”. Proceedings of 3rd International Workshop on a Research Agenda for Maintenance and Evolution of Service-Oriented Systems, September 20-26, 2009, Edmonton.
[balisage 2009] Quyen L. Nguyen and Betty Harvey. “Agile Business Objects Management Application for Electronic Records Archive Transfer Process”. Proceedings of Balisage, the Markup Conference 2009, Aug 11-14 2009, Montreal.
24ACERA - April 2011
ReferencesReferences
1. The Consultative Committee for Space Data Systems. “Reference Model for an Open Archival Information System (OAIS)”, 2002.
2. Preservation Metadata: Implementation Strategies (PREMIS). http://www.loc.gov/standards/premis/.
3. Robert Kahn and Robert Wilensky. “A Framework for Distributed Digital Objects”. International Journal on Digital Libraries (2006) 6(2): 115–123.
4. Adam Farquhar and Helen Hockx-Yu. “Planets: Integrated Services for Digital Preservation”. The International Journal of Digital Curation, Issue 2, Volume 2 | 2007.
5. IBM DIAS for The Royal Dutch Library. http://www-935.ibm.com/services/nl/dias/ref/references.html.
6. National Library of Australia. http://www.nla.gov.au/dsp/documents/itag.pdf
7. Jose Carlos Ramalho et al. “RODA and Crib – a Service-Oriented Digital Repository”. http://repositorium.sdum.uminho.pt/bitstream/1822/8226/1/RodaAndCrib.pdf.
25ACERA - April 2011
26
Thank You!Thank You!
Meg PhillipsMeg [email protected]@nara.gov
Quyen NguyenQuyen [email protected]@nara.gov
ACERA - April 2011
Evolution Relativity [balisage 2009]Evolution Relativity [balisage 2009]
28
Upper timeline shows evolution of the system itself. Lower timeline shows evolution of external systems that created
to-be archived data. Note the lags between the two timelines (several years). Challenge: evolving itself to use current technologies of epoch
Ta in order to provide long-term access to data born out of technologies at Tc time.
High-level Architecture RoadmapHigh-level Architecture Roadmap
29ACERA - April 2011
In Place
In Progress
Future
Planets Interoperability FrameworkPlanets Interoperability Framework
• Planets’ core components:
– Service Bus, and workflow
– security, monitoring, transaction manager, etc.
• Evolvability and extensibility: allow plugging of third-party services
ACERA - April 2011
The Royal Dutch Library The Royal Dutch Library
31
• Based on IBM’s Digital Information Archiving System (DIAS)
• Core component is Content Manager to store and manage both data and metadata
– Library Server:
• Cataloging and indexing of metadata
• Facilitate search and retrieval
• Security Control for access
– Object Server
• Store actual digital objects
ACERA - April 2011
Physical Implementation of AAPPhysical Implementation of AAP
ACERA - April 2011
32
• ZIP and URL options for encapsulating files.
N-Part IdentifierN-Part Identifier
33
Uniqueness
– Within ERA; Can be integrated with current and future standard protocols such as Handle, DOI, PURL, etc. Allow access to different levels of the ACE structure Identifiers can be assigned in a decentralized system Example:
ID of an Electronic Asset & its Metadata: 1.1–6–200902.1 The N-part ID can be made globally unique by prefixing it with the ERA namespace.
For instance, if “era.nara.gov” is used, then the above ID becomes:
http://www.era.nara.gov/1.1-1-200902.1
ACERA - April 2011
Possible Support of HTTP ProtocolPossible Support of HTTP Protocol
34
ERAIDResolver:Resolves the 1st and 2nd
segments of ERA-ID
ERA-HTTPExample: http://era.gov/1.6-1-200903.123
Content Server IDExample: CS # 6
Resolver in Content Server:Resolves remainder of ERA-ID if necessary
Storage ID
Storage
Object Physical Location
HTTP Web Server:Receives HTTP request and
passes it to ERA ID Resolver.
ERA-ID
ACERA - April 2011
35
Query
Results
AAPERA
Storage
Metadata Management
CONTENT
SERVER
ERA Storage
CONTENT
SERVER
ERA Storage
CONTENT
SERVER
Data Management
AccessSubsyste
m.
AAP
FederatorRoute AIP
toCorrect CS
Query
Results
DIP
Federator: Federate
Query
PolicyConfig.
Config.Policy
AAP
QueryResults
Query
Query
AccessWorking Storage(AWS)
AAP
Metadata Management
Metadata Management
Business Object
Management
Description Management
Preservation
Standard Operations and Interfaces:•Put AAP•Get AAP•Update AAP•Delete AAP
Search: 1. Metadata 2. Asset
Federators Global & LocalFederators Global & Local
ACERA - April 2011
•If requested asset is in OPA’s local storage, just send out the asset to requestor.
•If requested asset with given URI is in ERA, then the request gets pooled and forwarded to ERA system, which will push the asset.
Potential Access of Records in ERA
36ACERA - April 2011