Upload
joyce
View
29
Download
1
Embed Size (px)
DESCRIPTION
British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004. Agenda. The British Library Vision Our Audiences/Customers ILS Digitisation Digital Object Management Web Archiving Collaboration Conclusions. Magna Carta. What Is The British Library ?. - PowerPoint PPT Presentation
Citation preview
1
British Computer SocietyNorth London Branch
Major Programmes
Richard BoulderstoneJuly 27, 2004
2www.bl.uk
Agenda
• The British Library• Vision• Our Audiences/Customers• ILS• Digitisation• Digital Object Management• Web Archiving• Collaboration• Conclusions
Magna Carta
3www.bl.uk
What Is The British Library ?
• Created by British Library Act 1972 - commenced 1973
• Merger of British Museum Library (1753), National Reference Library of Science and Invention (1855), National Central Library (1916), and National Lending Library for Science and Technology (1961)
• Subsequent incorporation of British National Bibliography in 1974, India Office Library and Records in 1982, and British Institute of Recorded Sound in 1983
• Flagship building at St Pancras - largest public building project in Great Britain in 20th century - opened in 1998
4www.bl.uk
World-Class Research LibraryKey Statistics 2002/3
• 150 million items• 8.2 million items consulted or supplied• 408,000 reading room visits• 618,000 catalogue records created• 554,000 items received on legal deposit• 651 km shelf capacity 92% full add 12
km each year• 18.5M Web Site Hits (www.bl.uk)• 2,400 staff• £85.2 million Grant in Aid and £27.0
million trading income in 2001/2 • Annual report -
http://www.bl.uk/about/annual/latest.html
5www.bl.uk
… by aiding scientific advances… by adding commercial value for businesses… by contributing to UK “knowledge economy”… through the pursuit of academic excellence… through the stimulation of ideas… by adding to personal and family history… through increasing the nation’s cultural wellbeing… by giving information relevant to their interests… by helping to find the next medical breakthrough… by creating a link between the past, present and future
Outcome Based VisionPride
Relevance
In
novati
on
To help people
advance knowledge to enrich
lives
‘The World’s Knowledge’
6www.bl.uk
HighR+DIndustries
Prof.Services
CreativeIndustries
RESEARCHER
BUSINESSPUBLIC
LIBRARIES
EDUCATION
PublishingIndustries
SMEs
SchoolLibraries Teachers
Students11>18
LifelongLearner
Visitors(child + adult)
LifelongLearner
LifelongLearner
Librarians
PublicLibraries
Public
H.E.Libraries
Scholars
LifelongLearner
Postgraduate/Undergraduate
CommercialResearcher
Broadcastinge.g. BBC
Publishinge.g. OED
On-site Visits School Tours Web Learning
Reading Rooms Bespoke Services Reprographics Publishing Document Supply Searching Tools
Document Supply Resource Discovery Training Best Practice
Resource Discovery Bespoke Services Research Services Document Supply Reprographics Innovation Centre
Exhibitions Events Tours Publishing
2
7
Integrated Library System (ILS) Programme
Major Programmes/1
Da Vinci Notebook
8www.bl.uk
ILS: Development
Data migration Due to finish in a few days 16M+ BL records 10M+ records from other sources
Online ILS software All online changes made (mainly interfaces) – final
tests Web OPAC configuration – tested by staff, HE,
expert
Batch imports / exports Most ones done for go live Rest in priority order
9www.bl.uk
ILS: Implementation
Training Courses to end-users well underway ‘Practice’ system available ‘Search only’ training also underway
Testing Functional testing (end to end) nearly
complete Performance poor – OPAC very slow
Automated stress testing (LoadRunner scripts)
eIS trying to find area of problem Ex Libris experts flying over
Some security ‘hardening’ needed
10www.bl.uk
ILS: Cutover from legacy systems
Now: Temporary Aleph cataloguing
7 June: Phase 1 – internal processing Staggered take-on of users to ease cutover
problems Merge ‘temporary’ records
30 June: Phase 2 – reading rooms Reading rooms closed for cutover 26-29 June Mainly brand-new PCs etc rather than XP
upgrade
30 July: Phase 3 – remote users Could be delayed major problems
11www.bl.uk
12www.bl.uk
Future ILS development (ILS/2)
Current ILS development seen just as the start
Extra records E.g. Sound archive, Manuscripts, Newspaper issues
Extra functions E.g. Preservation records
Links to other new BL systems E.g. Digital Object Management (images, web pages
etc)
New releases of Ex Libris packages
13
Digitisation Programme
Major Programmes/2
International Dunhuang Project
14www.bl.uk
Background
Digitisation Is The Process Of Converting Existing Physical Items Into Digital Surrogates.
Digitisation Projects Must Take Into Account Metadata Creation, Optical Character Recognition, Navigation, Display, Archiving, Preservation.
Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.
Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.
BL Has Had Fairly Ad Hoc Approach Driven By External Funding Opportunities Curator Interest
Projects Have Generally Created Their Own Approach, IT Resources, Project Management
BL Has Created About 1.5M Digital Images So Far…
15www.bl.uk
Digitisation Strategy
Digitisation Strategy Project Was Formerly Initiated On February 2, 2004
Key objectives for the project are to define:
Selection Criteria Uniform Approach Communications Plan Sustainability Intellectual Property Rights External Relationship Management Funding Integration with DOMS
16www.bl.uk
Definitive Register of Projects 19 Complete 19 Current 20 Planning
JISC Sound (3,900 Hours)
JISC Newspapers (2M Pages of 750M Pages)
Chopin (Collaborative Project)
Early English Books Online
Project Status Information
17
Digital Object Management (DOM) Programme
Major Programmes/3
Gutenberg Bible
18www.bl.uk
DOM Programme vision
Our mission is to enable the United Kingdom to preserve and use its digital intellectual property forever
Our vision is create a management system for digital objects that will store and preserve any type of digital material in
perpetuity provide access to this material to users with
appropriate permissions ensure that the material is easy to find ensure that users can view the material with
contemporary applications ensure that users can, where possible, experience
material with the original look-and-feel
19www.bl.uk
Introduction - history
Digital Library PFI Mar 1997 – Dec 1998
Digital Library System 1999 – early 2002 Lessons
DOM Report Nov 2002
The DOM Programme Started September 2003
20www.bl.uk
Drivers for the BL DOM Programme
Legal deposit legislation for non-print material was granted royal assent in October 2003
Existing voluntary deposit scheme operational since 2000 Storage of digitised masters from early ’90s onwards New digitisation initiatives: newspapers, sound, etc Sound archive receives 12T of material per year (with 50
year collection) Web archiving Cartography and datasets Electronic journals, picture library … and …. …. and ….
We need a generic and cost-effective approach for the secure long term storage of digital material that is produced by numerous initiatives
21www.bl.uk
DOM – many topics to address
LDEP: Legal Deposit of Electronic PublicationsLDLSE: Legal Deposit Libraries Secure EnvironmentRADM: Risk Analysis of Digital MaterialsSDM: Storage of Digitised MastersVDEP: Voluntary Deposit of Electronic Publications
HIGH
HIGH
LOW
LOW
COMPONENT AMBIGUITY / COMPLEXITY
ES
TIM
ATED
SIZ
E O
F C
OM
PO
NEN
T
TECHNICALREQUIREMENTS
VDEP
SDM
RADMSTRATEGY
DEVELOPMENT
PROTOTYPES
LDEP
RESOURCEDISCOVERY
INTERFACES
METADATADEFINITION
RIGHTSMANAGEMENT
WORKFLOW
Started Planned
Non-DOM projects Planned co-operation
FILECONVERSION
UTILITIES
PERSISTENTIDENTIFIERS
LDLSE
FILEFORMAT
REGISTRY
WEBARCHIVING
ILS
AUTHENTICATION
DIGITISATIONPROGRAMME
22www.bl.uk
Scope - life cycle of objects
Collection Selection Acquisition Accession Description
Preservation Storage Preservation
Access Resource discovery Delivery Rendering
23www.bl.uk
Scope – objects and processes
Preservation store Preserves the bit stream in perpetuity
Access store Access versions Limited formats – in the flavour of the era
Metadata to support resource discovery Descriptive, Administrative, Links with
existing tools e.g. Integrated Library System (ILS)
Workflow Ingest, e.g. Legal Deposit processing
24www.bl.uk
DOMStorage
DOMResource Discovery Delivery
ACCESS
DONATIONSWEB
ARCHIVING
Non-SerialStore
Grey Literature
PublishersArchives
ArchivingOperational
Stores
DOCUMENT SUPPLY
NSA
Newspapers
St PancrasStudios
DIGITISATION
LDL SecureEnvironment
Legal DepositItems
Legal DepositProcessing
LEGAL DEPOSIT
Digital Rights ManagementShared services
AuthenticationMetadata
Persistent ID
Signing
Ingest
25www.bl.uk
R0
Timeline
R2
R1
BC
Prototype will provide a basic preservation-quality digital object
storage module
•Consolidate R0 into operational system•Provide preservation-quality digital store for materials received under Voluntary Deposit of Electronic Publications (VDEP)•Integrate it with the existing VDEP front-end
•Support ingest for a major content stream•Integrate with core Library systems as required
Definition. R0
Operat’l Storage Sub System. R1
2003 2004 2005
ET approveBusiness Case
& Timeline
1st Content Stream ingest. R2
R3
Open DOM to new projects. R5+
LDEP - initial format. R3 & R4
R4
Provide functionality for material covered by LDEP secondary legislation
26www.bl.uk
DOM: Project definition - 1
digital rights, file formats, etc
allow changes to new suppliers,
relationships to ILS, other projects etc
how do we build it cost-effectively today, supplier
selection criteria
Functional Architecture “What”
Logical architecture “how – overall architecture”
Physical architecture “how – storage & specifics”
Cross team workshops – reviewing progress, debating detailed technical issues, planning immediate priorities, risk management & way forward
Prototyping - basic functioning architecture
Prototyping - principal solutions and options
Prototyping – assessing market solutions
• Business case• Planning – incremental
implementation phases
Example issues
27www.bl.uk
DOM: Project definition - 2
Approach is to be incremental and not ‘Big Bang’ We prototype to learn, understand, reduce risk and
uncertainty, and demonstrate the basis of a good solution
A principal goal is to define: An overall long term “logical architecture” Within which, there will be successive generations of
physical architectures
We are understanding the storage marketplace, and we will use the knowledge to manage procurement
We are certain that we will need >500T of storage but we are uncertain when – we thus need flexible scalable procurement
28www.bl.uk
DOM architecture - overview
DOM Storage Service
DOM Physical Storage
Unique persistent identifier (DOMID) Integrity Authenticity
Compound objects/relations
Atom
ic O
bjects
Others Resource Discovery
ILSNon-cat
based RD
Rights Management
LDEPDoc
supply
DOMID OBJECT
Local resource locator ObjectDOMID is mapped to
node/vol/LRL
29www.bl.uk
DOM System
DOM System (release 3)
Mailro
om
Administration
Acce
ss
Storage subsystem
Storage subsystem
Shared services
Publishers
Aleph
30www.bl.uk
DOM logical architecture – integrity and authenticity
Integrity: System has capability to continuously
monitor the object store to detect object corruption
It would then initiate object recovery Authenticity:
A process is defined to provide long-term assurance that an object that is re-presented is as it was when it was ingested
Based on the use of cryptographic signing techniques Each object is signed when it is ingested The signature is verified when required The signing mechanism is “tightly”
controlled
31www.bl.uk
Procuring physical storage in volume
A major cost is in physical storage The market for storage systems is changing rapidly,
and this implies that “lock-in” is not sensible We thus need flexibility to change supplier over time Cost of storage is reducing by 30-40% per year Hence procure on rolling basis just ahead of demand Replace storage on a rolling basis on expiry of warranty The rolling programmes imply the need to be able to
support a heterogeneous product solution The design of the logical architecture thus supports
storage sourced from multiple storage vendors
32www.bl.uk
Disaster tolerance and the organisation of storage clusters
One can obtain commercial disaster recovery (DR) solutions for common equipment configurations
However one cannot obtain such solutions for systems comprising multi-100 Tb systems
So we must build in the need for DR into the design of the system
A single site solution, subject to a common-mode disaster, would suffer considerable loss of availability after a disaster, and so is not acceptable
This implies that we need a multi-site solution Conventionally these are based on a master-standby
where only 50% of kit is delivering normal service Our design is based on the use of multiple
autonomous independent peer clusters that cross-synchroniseso 100% of the kit delivers normal service
33www.bl.uk
DOM architecture in the context of the storage solution market
The dominant segment of the market focuses on delivering performance within a highly resilient single cluster
However: Many of our objects will be rarely accessed
so we do not want to pay for “maximised” performance we do not need
We have resilience by using multiple clusters, hence we have a reduced need for resilience within a cluster so we do not want to pay for
“maximised” resilience we do not need We are using these drivers to design a cost-
effective large scale resilient solution
34www.bl.uk
DOM storage subsystem architecture - overview
DOMShared
Services• Unique ID• Signing• Logging
DOM Physical Storage
DOM Storage Service
DOM Storage gateway
Storage cluster
DOM Physical Storage
DOM Storage Service
DOM Storage gateway
Storage cluster
35www.bl.uk
DOM storage subsystem architecture - access
DOM central
• Unique ID• Signing• Logging
DOM Physical Storage
DOM Storage Service
DOM Storage gateway
Storage cluster
DOM Physical Storage
DOM Storage Service
DOM Storage gateway
Storage cluster
Normal access/delivery is from local storage
cluster
DOMShared
Services• Unique ID• Signing• Logging
36www.bl.uk
DOM storage subsystem architecture - access
DOM central
• Unique ID• Signing• Logging
DOM Physical Storage
DOM Storage Service
DOM Storage gateway
Storage cluster
DOM Physical Storage
DOM Storage Service
DOM Storage gateway
Storage cluster
When a cluster is off-line then access/delivery is from a remote storage
cluster
DOMShared
Services• Unique ID• Signing• Logging
37www.bl.uk
DOM storage subsystem architecture - ingest
DOM central
• Unique ID• Signing• Logging
DOM Physical Storage
DOM Storage Service
DOM Storage gateway
Storage cluster
DOM Physical Storage
DOM Storage Service
DOM Storage gateway
Storage cluster
Synchronise remote store
Normal ingest is to the local storage cluster and then the remote cluster
is synchronised
Signing
Store
DOMShared
Services• Unique ID• Signing• Logging
38www.bl.uk
DOM storage subsystem architecture - ingest
DOM central
• Unique ID• Signing• Logging
DOM Physical Storage
DOM Storage Service
DOM Storage gateway
Storage cluster
DOM Physical Storage
DOM Storage Service
DOM Storage gateway
Storage cluster
Synchronise remote store later
When a cluster is off-line then ingest is managed by the remote storage cluster
and the local cluster is synchronised later
Signing
Store
DOMShared
Services• Unique ID• Signing• Logging
39www.bl.uk
In conclusion
We plan for generations of physical storage Migration from one generation to the next Allow changes of supplier Purchase incrementally in modest quantities Move quickly when required Be cost conscious
We provide assurance that an object is held and re-presented as when it was ingested
We are designing a cost-effective large scale resilient solution
In summary: we take a long term view
40
Web Archiving Programme
Major Programmes/4
41www.bl.uk
Structure of Programme
Web Archiving Programme is a collaborative initiative, roughly implemented across two consortiums UK Web Archiving Consortium
Developing a selective approach to web archiving, procuring a common web archiving infrastructure and software to begin archiving activities at the earliest
International Internet Preservation Consortium Developing advanced web archiving
technologies for the long terms, large scale, continuous crawling requirements enabled through legislation
42www.bl.uk
UK Web Archiving Consortium
Developing a selective approach to web archiving License for PANDAS about to be signed with
NLA Sub-licenses with consortium partners and
contractor to follow ITT concluded with Magus Research
winning the contract. Implement a common web arching
infrastructure (lots of Linux machines + PANDAS)
Provide customisation/development of PANDAS
Provide help desk and support
43www.bl.uk
International Internet Preservation Consortium
Developing advanced web archiving technologies
Smart Crawler Continuous adaptive crawler, adjusting
crawl priority on the fly Based on IA Heritrix Working on requirements now Expect to being tender process in June
Content Management Archival formats Framework Metrics and Test Bed
44
External Collaboration
45www.bl.uk
Digital Library Collaborations/PartnershipsCurrent
UK Digital Preservation Collation Founder Member
TEL (The European Library Project) Web Archiving UK
JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales
International Internet Preservation Consortium BNF, Library Of Congress, Internet Archive, National Archives &
Library Of Canada, National Library Of Australia, National Library Of Italy, National Libraries Of Nordic Countries
JISC Funded - Digital Curation Centre Persistent Identifiers
DOI Foundation, European National Libraries (KB & DDB) Resource Discovery
Union Catalogues (SUNCAT) Digital Library Federation
46www.bl.uk
Secure Legal Deposit Network 6 Legal Deposit Libraries
Global Digital Format Registry Potential Partners (National Archives, DLF)
Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration) KB (Netherlands National Library & Other Partners – FP6 Bid)
Digital Rights Management Potential Partners (Publishers, JISC)
Metadata Publishers, Others ?
Authentication JISC ?
Resource Discovery Search Engine Vendors, Researchers
Others ???
Digital Library Collaborations/PartnershipsPotential
47www.bl.uk
Conclusions
• Beautiful Building!• Market & Outcome Focus• Huge IT Agenda• Collaboration Is Critical To Our Success
• Can You Work With Us?