47
1 British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

  • Upload
    joyce

  • View
    29

  • Download
    1

Embed Size (px)

DESCRIPTION

British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004. Agenda. The British Library Vision Our Audiences/Customers ILS Digitisation Digital Object Management Web Archiving Collaboration Conclusions. Magna Carta. What Is The British Library ?. - PowerPoint PPT Presentation

Citation preview

Page 1: British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

1

British Computer SocietyNorth London Branch

Major Programmes

Richard BoulderstoneJuly 27, 2004

Page 2: British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

2www.bl.uk

Agenda

• The British Library• Vision• Our Audiences/Customers• ILS• Digitisation• Digital Object Management• Web Archiving• Collaboration• Conclusions

Magna Carta

Page 3: British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

3www.bl.uk

What Is The British Library ?

• Created by British Library Act 1972 - commenced 1973

• Merger of British Museum Library (1753), National Reference Library of Science and Invention (1855), National Central Library (1916), and National Lending Library for Science and Technology (1961)

• Subsequent incorporation of British National Bibliography in 1974, India Office Library and Records in 1982, and British Institute of Recorded Sound in 1983

• Flagship building at St Pancras - largest public building project in Great Britain in 20th century - opened in 1998

Page 4: British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

4www.bl.uk

World-Class Research LibraryKey Statistics 2002/3

• 150 million items• 8.2 million items consulted or supplied• 408,000 reading room visits• 618,000 catalogue records created• 554,000 items received on legal deposit• 651 km shelf capacity 92% full add 12

km each year• 18.5M Web Site Hits (www.bl.uk)• 2,400 staff• £85.2 million Grant in Aid and £27.0

million trading income in 2001/2 • Annual report -

http://www.bl.uk/about/annual/latest.html

Page 5: British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

5www.bl.uk

… by aiding scientific advances… by adding commercial value for businesses… by contributing to UK “knowledge economy”… through the pursuit of academic excellence… through the stimulation of ideas… by adding to personal and family history… through increasing the nation’s cultural wellbeing… by giving information relevant to their interests… by helping to find the next medical breakthrough… by creating a link between the past, present and future

Outcome Based VisionPride

Relevance

In

novati

on

To help people

advance knowledge to enrich

lives

‘The World’s Knowledge’

Page 6: British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

6www.bl.uk

HighR+DIndustries

Prof.Services

CreativeIndustries

RESEARCHER

BUSINESSPUBLIC

LIBRARIES

EDUCATION

PublishingIndustries

SMEs

SchoolLibraries Teachers

Students11>18

LifelongLearner

Visitors(child + adult)

LifelongLearner

LifelongLearner

Librarians

PublicLibraries

Public

H.E.Libraries

Scholars

LifelongLearner

Postgraduate/Undergraduate

CommercialResearcher

Broadcastinge.g. BBC

Publishinge.g. OED

On-site Visits School Tours Web Learning

Reading Rooms Bespoke Services Reprographics Publishing Document Supply Searching Tools

Document Supply Resource Discovery Training Best Practice

Resource Discovery Bespoke Services Research Services Document Supply Reprographics Innovation Centre

Exhibitions Events Tours Publishing

2

Page 7: British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

7

Integrated Library System (ILS) Programme

Major Programmes/1

Da Vinci Notebook

Page 8: British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

8www.bl.uk

ILS: Development

Data migration Due to finish in a few days 16M+ BL records 10M+ records from other sources

Online ILS software All online changes made (mainly interfaces) – final

tests Web OPAC configuration – tested by staff, HE,

expert

Batch imports / exports Most ones done for go live Rest in priority order

Page 9: British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

9www.bl.uk

ILS: Implementation

Training Courses to end-users well underway ‘Practice’ system available ‘Search only’ training also underway

Testing Functional testing (end to end) nearly

complete Performance poor – OPAC very slow

Automated stress testing (LoadRunner scripts)

eIS trying to find area of problem Ex Libris experts flying over

Some security ‘hardening’ needed

Page 10: British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

10www.bl.uk

ILS: Cutover from legacy systems

Now: Temporary Aleph cataloguing

7 June: Phase 1 – internal processing Staggered take-on of users to ease cutover

problems Merge ‘temporary’ records

30 June: Phase 2 – reading rooms Reading rooms closed for cutover 26-29 June Mainly brand-new PCs etc rather than XP

upgrade

30 July: Phase 3 – remote users Could be delayed major problems

Page 11: British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

11www.bl.uk

Page 12: British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

12www.bl.uk

Future ILS development (ILS/2)

Current ILS development seen just as the start

Extra records E.g. Sound archive, Manuscripts, Newspaper issues

Extra functions E.g. Preservation records

Links to other new BL systems E.g. Digital Object Management (images, web pages

etc)

New releases of Ex Libris packages

Page 13: British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

13

Digitisation Programme

Major Programmes/2

International Dunhuang Project

Page 14: British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

14www.bl.uk

Background

Digitisation Is The Process Of Converting Existing Physical Items Into Digital Surrogates.

Digitisation Projects Must Take Into Account Metadata Creation, Optical Character Recognition, Navigation, Display, Archiving, Preservation.

Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image.

Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000.

BL Has Had Fairly Ad Hoc Approach Driven By External Funding Opportunities Curator Interest

Projects Have Generally Created Their Own Approach, IT Resources, Project Management

BL Has Created About 1.5M Digital Images So Far…

Page 15: British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

15www.bl.uk

Digitisation Strategy

Digitisation Strategy Project Was Formerly Initiated On February 2, 2004

Key objectives for the project are to define:

Selection Criteria Uniform Approach Communications Plan Sustainability Intellectual Property Rights External Relationship Management Funding Integration with DOMS

Page 16: British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

16www.bl.uk

Definitive Register of Projects 19 Complete 19 Current 20 Planning

JISC Sound (3,900 Hours)

JISC Newspapers (2M Pages of 750M Pages)

Chopin (Collaborative Project)

Early English Books Online

Project Status Information

Page 17: British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

17

Digital Object Management (DOM) Programme

Major Programmes/3

Gutenberg Bible

Page 18: British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

18www.bl.uk

DOM Programme vision

Our mission is to enable the United Kingdom to preserve and use its digital intellectual property forever

Our vision is create a management system for digital objects that will store and preserve any type of digital material in

perpetuity provide access to this material to users with

appropriate permissions ensure that the material is easy to find ensure that users can view the material with

contemporary applications ensure that users can, where possible, experience

material with the original look-and-feel

Page 19: British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

19www.bl.uk

Introduction - history

Digital Library PFI Mar 1997 – Dec 1998

Digital Library System 1999 – early 2002 Lessons

DOM Report Nov 2002

The DOM Programme Started September 2003

Page 20: British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

20www.bl.uk

Drivers for the BL DOM Programme

Legal deposit legislation for non-print material was granted royal assent in October 2003

Existing voluntary deposit scheme operational since 2000 Storage of digitised masters from early ’90s onwards New digitisation initiatives: newspapers, sound, etc Sound archive receives 12T of material per year (with 50

year collection) Web archiving Cartography and datasets Electronic journals, picture library … and …. …. and ….

We need a generic and cost-effective approach for the secure long term storage of digital material that is produced by numerous initiatives

Page 21: British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

21www.bl.uk

DOM – many topics to address

LDEP: Legal Deposit of Electronic PublicationsLDLSE: Legal Deposit Libraries Secure EnvironmentRADM: Risk Analysis of Digital MaterialsSDM: Storage of Digitised MastersVDEP: Voluntary Deposit of Electronic Publications

HIGH

HIGH

LOW

LOW

COMPONENT AMBIGUITY / COMPLEXITY

ES

TIM

ATED

SIZ

E O

F C

OM

PO

NEN

T

TECHNICALREQUIREMENTS

VDEP

SDM

RADMSTRATEGY

DEVELOPMENT

PROTOTYPES

LDEP

RESOURCEDISCOVERY

INTERFACES

METADATADEFINITION

RIGHTSMANAGEMENT

WORKFLOW

Started Planned

Non-DOM projects Planned co-operation

FILECONVERSION

UTILITIES

PERSISTENTIDENTIFIERS

LDLSE

FILEFORMAT

REGISTRY

WEBARCHIVING

ILS

AUTHENTICATION

DIGITISATIONPROGRAMME

Page 22: British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

22www.bl.uk

Scope - life cycle of objects

Collection Selection Acquisition Accession Description

Preservation Storage Preservation

Access Resource discovery Delivery Rendering

Page 23: British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

23www.bl.uk

Scope – objects and processes

Preservation store Preserves the bit stream in perpetuity

Access store Access versions Limited formats – in the flavour of the era

Metadata to support resource discovery Descriptive, Administrative, Links with

existing tools e.g. Integrated Library System (ILS)

Workflow Ingest, e.g. Legal Deposit processing

Page 24: British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

24www.bl.uk

DOMStorage

DOMResource Discovery Delivery

ACCESS

DONATIONSWEB

ARCHIVING

Non-SerialStore

Grey Literature

PublishersArchives

ArchivingOperational

Stores

DOCUMENT SUPPLY

NSA

Newspapers

St PancrasStudios

DIGITISATION

LDL SecureEnvironment

Legal DepositItems

Legal DepositProcessing

LEGAL DEPOSIT

Digital Rights ManagementShared services

AuthenticationMetadata

Persistent ID

Signing

Ingest

Page 25: British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

25www.bl.uk

R0

Timeline

R2

R1

BC

Prototype will provide a basic preservation-quality digital object

storage module

•Consolidate R0 into operational system•Provide preservation-quality digital store for materials received under Voluntary Deposit of Electronic Publications (VDEP)•Integrate it with the existing VDEP front-end

•Support ingest for a major content stream•Integrate with core Library systems as required

Definition. R0

Operat’l Storage Sub System. R1

2003 2004 2005

ET approveBusiness Case

& Timeline

1st Content Stream ingest. R2

R3

Open DOM to new projects. R5+

LDEP - initial format. R3 & R4

R4

Provide functionality for material covered by LDEP secondary legislation

Page 26: British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

26www.bl.uk

DOM: Project definition - 1

digital rights, file formats, etc

allow changes to new suppliers,

relationships to ILS, other projects etc

how do we build it cost-effectively today, supplier

selection criteria

Functional Architecture “What”

Logical architecture “how – overall architecture”

Physical architecture “how – storage & specifics”

Cross team workshops – reviewing progress, debating detailed technical issues, planning immediate priorities, risk management & way forward

Prototyping - basic functioning architecture

Prototyping - principal solutions and options

Prototyping – assessing market solutions

• Business case• Planning – incremental

implementation phases

Example issues

Page 27: British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

27www.bl.uk

DOM: Project definition - 2

Approach is to be incremental and not ‘Big Bang’ We prototype to learn, understand, reduce risk and

uncertainty, and demonstrate the basis of a good solution

A principal goal is to define: An overall long term “logical architecture” Within which, there will be successive generations of

physical architectures

We are understanding the storage marketplace, and we will use the knowledge to manage procurement

We are certain that we will need >500T of storage but we are uncertain when – we thus need flexible scalable procurement

Page 28: British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

28www.bl.uk

DOM architecture - overview

DOM Storage Service

DOM Physical Storage

Unique persistent identifier (DOMID) Integrity Authenticity

Compound objects/relations

Atom

ic O

bjects

Others Resource Discovery

ILSNon-cat

based RD

Rights Management

LDEPDoc

supply

DOMID OBJECT

Local resource locator ObjectDOMID is mapped to

node/vol/LRL

Page 29: British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

29www.bl.uk

DOM System

DOM System (release 3)

Mailro

om

Administration

Acce

ss

Storage subsystem

Storage subsystem

Shared services

Publishers

Aleph

Page 30: British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

30www.bl.uk

DOM logical architecture – integrity and authenticity

Integrity: System has capability to continuously

monitor the object store to detect object corruption

It would then initiate object recovery Authenticity:

A process is defined to provide long-term assurance that an object that is re-presented is as it was when it was ingested

Based on the use of cryptographic signing techniques Each object is signed when it is ingested The signature is verified when required The signing mechanism is “tightly”

controlled

Page 31: British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

31www.bl.uk

Procuring physical storage in volume

A major cost is in physical storage The market for storage systems is changing rapidly,

and this implies that “lock-in” is not sensible We thus need flexibility to change supplier over time Cost of storage is reducing by 30-40% per year Hence procure on rolling basis just ahead of demand Replace storage on a rolling basis on expiry of warranty The rolling programmes imply the need to be able to

support a heterogeneous product solution The design of the logical architecture thus supports

storage sourced from multiple storage vendors

Page 32: British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

32www.bl.uk

Disaster tolerance and the organisation of storage clusters

One can obtain commercial disaster recovery (DR) solutions for common equipment configurations

However one cannot obtain such solutions for systems comprising multi-100 Tb systems

So we must build in the need for DR into the design of the system

A single site solution, subject to a common-mode disaster, would suffer considerable loss of availability after a disaster, and so is not acceptable

This implies that we need a multi-site solution Conventionally these are based on a master-standby

where only 50% of kit is delivering normal service Our design is based on the use of multiple

autonomous independent peer clusters that cross-synchroniseso 100% of the kit delivers normal service

Page 33: British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

33www.bl.uk

DOM architecture in the context of the storage solution market

The dominant segment of the market focuses on delivering performance within a highly resilient single cluster

However: Many of our objects will be rarely accessed

so we do not want to pay for “maximised” performance we do not need

We have resilience by using multiple clusters, hence we have a reduced need for resilience within a cluster so we do not want to pay for

“maximised” resilience we do not need We are using these drivers to design a cost-

effective large scale resilient solution

Page 34: British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

34www.bl.uk

DOM storage subsystem architecture - overview

DOMShared

Services• Unique ID• Signing• Logging

DOM Physical Storage

DOM Storage Service

DOM Storage gateway

Storage cluster

DOM Physical Storage

DOM Storage Service

DOM Storage gateway

Storage cluster

Page 35: British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

35www.bl.uk

DOM storage subsystem architecture - access

DOM central

• Unique ID• Signing• Logging

DOM Physical Storage

DOM Storage Service

DOM Storage gateway

Storage cluster

DOM Physical Storage

DOM Storage Service

DOM Storage gateway

Storage cluster

Normal access/delivery is from local storage

cluster

DOMShared

Services• Unique ID• Signing• Logging

Page 36: British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

36www.bl.uk

DOM storage subsystem architecture - access

DOM central

• Unique ID• Signing• Logging

DOM Physical Storage

DOM Storage Service

DOM Storage gateway

Storage cluster

DOM Physical Storage

DOM Storage Service

DOM Storage gateway

Storage cluster

When a cluster is off-line then access/delivery is from a remote storage

cluster

DOMShared

Services• Unique ID• Signing• Logging

Page 37: British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

37www.bl.uk

DOM storage subsystem architecture - ingest

DOM central

• Unique ID• Signing• Logging

DOM Physical Storage

DOM Storage Service

DOM Storage gateway

Storage cluster

DOM Physical Storage

DOM Storage Service

DOM Storage gateway

Storage cluster

Synchronise remote store

Normal ingest is to the local storage cluster and then the remote cluster

is synchronised

Signing

Store

DOMShared

Services• Unique ID• Signing• Logging

Page 38: British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

38www.bl.uk

DOM storage subsystem architecture - ingest

DOM central

• Unique ID• Signing• Logging

DOM Physical Storage

DOM Storage Service

DOM Storage gateway

Storage cluster

DOM Physical Storage

DOM Storage Service

DOM Storage gateway

Storage cluster

Synchronise remote store later

When a cluster is off-line then ingest is managed by the remote storage cluster

and the local cluster is synchronised later

Signing

Store

DOMShared

Services• Unique ID• Signing• Logging

Page 39: British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

39www.bl.uk

In conclusion

We plan for generations of physical storage Migration from one generation to the next Allow changes of supplier Purchase incrementally in modest quantities Move quickly when required Be cost conscious

We provide assurance that an object is held and re-presented as when it was ingested

We are designing a cost-effective large scale resilient solution

In summary: we take a long term view

Page 40: British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

40

Web Archiving Programme

Major Programmes/4

Page 41: British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

41www.bl.uk

Structure of Programme

Web Archiving Programme is a collaborative initiative, roughly implemented across two consortiums UK Web Archiving Consortium

Developing a selective approach to web archiving, procuring a common web archiving infrastructure and software to begin archiving activities at the earliest

International Internet Preservation Consortium Developing advanced web archiving

technologies for the long terms, large scale, continuous crawling requirements enabled through legislation

Page 42: British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

42www.bl.uk

UK Web Archiving Consortium

Developing a selective approach to web archiving License for PANDAS about to be signed with

NLA Sub-licenses with consortium partners and

contractor to follow ITT concluded with Magus Research

winning the contract. Implement a common web arching

infrastructure (lots of Linux machines + PANDAS)

Provide customisation/development of PANDAS

Provide help desk and support

Page 43: British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

43www.bl.uk

International Internet Preservation Consortium

Developing advanced web archiving technologies

Smart Crawler Continuous adaptive crawler, adjusting

crawl priority on the fly Based on IA Heritrix Working on requirements now Expect to being tender process in June

Content Management Archival formats Framework Metrics and Test Bed

Page 44: British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

44

External Collaboration

Page 45: British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

45www.bl.uk

Digital Library Collaborations/PartnershipsCurrent

UK Digital Preservation Collation Founder Member

TEL (The European Library Project) Web Archiving UK

JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales

International Internet Preservation Consortium BNF, Library Of Congress, Internet Archive, National Archives &

Library Of Canada, National Library Of Australia, National Library Of Italy, National Libraries Of Nordic Countries

JISC Funded - Digital Curation Centre Persistent Identifiers

DOI Foundation, European National Libraries (KB & DDB) Resource Discovery

Union Catalogues (SUNCAT) Digital Library Federation

Page 46: British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

46www.bl.uk

Secure Legal Deposit Network 6 Legal Deposit Libraries

Global Digital Format Registry Potential Partners (National Archives, DLF)

Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration) KB (Netherlands National Library & Other Partners – FP6 Bid)

Digital Rights Management Potential Partners (Publishers, JISC)

Metadata Publishers, Others ?

Authentication JISC ?

Resource Discovery Search Engine Vendors, Researchers

Others ???

Digital Library Collaborations/PartnershipsPotential

Page 47: British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004

47www.bl.uk

Conclusions

• Beautiful Building!• Market & Outcome Focus• Huge IT Agenda• Collaboration Is Critical To Our Success

• Can You Work With Us?