18
Digital Archiving at Digital Archiving at Elsevier Elsevier Joep Verheggen, ScienceDirect ICSTI Conference, London, 17 May 2004

Digital Archiving at Elsevier Joep Verheggen, ScienceDirect ICSTI Conference, London, 17 May 2004

Embed Size (px)

Citation preview

Page 1: Digital Archiving at Elsevier Joep Verheggen, ScienceDirect ICSTI Conference, London, 17 May 2004

Digital Archiving at ElsevierDigital Archiving at ElsevierDigital Archiving at ElsevierDigital Archiving at Elsevier

Joep Verheggen, ScienceDirectICSTI Conference, London, 17 May 2004

Page 2: Digital Archiving at Elsevier Joep Verheggen, ScienceDirect ICSTI Conference, London, 17 May 2004

2

AgendaAgendaAgendaAgenda

• Short introduction about Elsevier

• Archiving; why is this so important and what is our position

• “YOAS” project• “Technical aspects”

Note: this presentation focusses on journal content

Page 3: Digital Archiving at Elsevier Joep Verheggen, ScienceDirect ICSTI Conference, London, 17 May 2004

3

Elsevier vision...Elsevier vision...Elsevier vision...Elsevier vision...

…to deliver superior information products and services that provide solutions for scientists, medical professionals and librarians ...

Page 4: Digital Archiving at Elsevier Joep Verheggen, ScienceDirect ICSTI Conference, London, 17 May 2004

4

Archiving terminologyArchiving terminologyArchiving terminologyArchiving terminology

• there can be confusion when talking of archives between:– (1) ongoing access to current services

and– (2) long-term storage and preservation

of the intellectual content

• we provide for both in our licenses • this presentation primarily related

to (2)

Page 5: Digital Archiving at Elsevier Joep Verheggen, ScienceDirect ICSTI Conference, London, 17 May 2004

5

Long-term preservationLong-term preservationLong-term preservationLong-term preservation

• significance of going “e-only”• many university and corporate

libraries have cancelled paper and use electronic only -- and this is increasing weekly

• e-only puts greater pressure on archival preservation -- and archiving of both the print and the electronic versions

• archiving high on the agenda of individual libraries and library groups

Page 6: Digital Archiving at Elsevier Joep Verheggen, ScienceDirect ICSTI Conference, London, 17 May 2004

6

Responsibility for archivingResponsibility for archivingResponsibility for archivingResponsibility for archiving

• Elsevier takes digital archiving seriously– responsibility to authors– responsibility for maintaining

“the minutes of science”– importance to the library

community– interest in maintaining an

asset

Page 7: Digital Archiving at Elsevier Joep Verheggen, ScienceDirect ICSTI Conference, London, 17 May 2004

7

Broad range of actionsBroad range of actionsBroad range of actionsBroad range of actions

• have participated in discussions, projects and committees related to digital archiving since 1995

• among the first (after AIP) to make public archiving commitment and perhaps the first to incorporate it in our license

• currently making multi-million dollar investment in internal back-up systems

Page 8: Digital Archiving at Elsevier Joep Verheggen, ScienceDirect ICSTI Conference, London, 17 May 2004

8

Current license languageCurrent license languageCurrent license languageCurrent license language

• since 1999, all ScienceDirect licenses for online service contains an annex specifying:– we will maintain a permanent

archive of the SD journals we own– we will migrate the archive as the

technology used for storage or access changes

– we will transfer the archive to an independent, librarian-approved depository if we cannot maintain it

Page 9: Digital Archiving at Elsevier Joep Verheggen, ScienceDirect ICSTI Conference, London, 17 May 2004

9

Sizing the problemSizing the problemSizing the problemSizing the problem

• there are more than 1800 Elsevier journals on ScienceDirect

• we are retrodigitizing: creating digital backfiles from v. 1, n. 1 on all titles

• expect to have more than 6 million articles on ScienceDirect by the end of this year

• original size estimate of total file: 50 million pages, 6.5 to 7 terabytes

• Project started in 2001, completed in 2004

Page 10: Digital Archiving at Elsevier Joep Verheggen, ScienceDirect ICSTI Conference, London, 17 May 2004

10

Types of archives Types of archives Types of archives Types of archives

• internal production “archive” Electronic Warehouse, not ScienceDirect

• “defacto archives” about 10 regular ScienceDirect OnSite (SDOS) customers worldwide who get everything or nearly everything for local loading (but make no archiving commitment beyond their constituency)

Page 11: Digital Archiving at Elsevier Joep Verheggen, ScienceDirect ICSTI Conference, London, 17 May 2004

11

Types of archives -- continuedTypes of archives -- continuedTypes of archives -- continuedTypes of archives -- continued

• self-designated “national” archives libraries or other institutions that choose to maintain an archival copy locally as a national security measure; variation on SDOS license

• “official Elsevier archive” formal, contractual relationship between Elsevier and a trusted archival institution to provide permanent retention and access to the digital files for future generations

Page 12: Digital Archiving at Elsevier Joep Verheggen, ScienceDirect ICSTI Conference, London, 17 May 2004

12

Official Elsevier archivesOfficial Elsevier archivesOfficial Elsevier archivesOfficial Elsevier archives

• we did an investigative project with Yale University Library (with funding from the Mellon Foundation) which was completed in early 2002

• signed the first formal agreement for an official archive with the Koninklijke Bibliotheek (KB) in August, 2002

• likely to do 3-4 additional agreements (in North America, Asia and Europe)

Page 13: Digital Archiving at Elsevier Joep Verheggen, ScienceDirect ICSTI Conference, London, 17 May 2004

13

Koninklijke BibliotheekKoninklijke BibliotheekKoninklijke BibliotheekKoninklijke Bibliotheek

• an recognized international leader in digital archiving investigations

• fortunately, also our national library

• Elsevier was already sending electronic files for its 351 Dutch imprint journals

• now expand to the entire 1,800 title journal list, which the KB will archive “forever”

Page 14: Digital Archiving at Elsevier Joep Verheggen, ScienceDirect ICSTI Conference, London, 17 May 2004

14

Official archive contract termsOfficial archive contract termsOfficial archive contract termsOfficial archive contract terms

• contract is different from a normal license for SD– perpetual nature of an archive– service level agreement– trigger events -- public access– financial terms– format for submission– comprehensiveness of archive (e.g.,

handling of “withdrawn” material)• as standards for archival

repositories develop, KB must meet these

Page 15: Digital Archiving at Elsevier Joep Verheggen, ScienceDirect ICSTI Conference, London, 17 May 2004

15

Use of the official archivesUse of the official archivesUse of the official archivesUse of the official archives

• available for walk-in users now• available remotely to anyone in

the event we exit the business and no one else takes over

• in the event of a disaster that would result in ScienceDirect being down for a prolonged period, all libraries holding the journals (archives or SDOS) would be invited to open access to all (no access controls)

Page 16: Digital Archiving at Elsevier Joep Verheggen, ScienceDirect ICSTI Conference, London, 17 May 2004

16

““Technical aspects”; LOCKSS Technical aspects”; LOCKSS principle principle ““Technical aspects”; LOCKSS Technical aspects”; LOCKSS principle principle Hardware

• Dayton hosting system is located in a bunker that is Tornado-, Earthquake-, and aircraft impact proof

• Daily incremental backups, weekly complete backups

• Off-site copies of backups, extensive recovery procedures in place

• Migration to new type hardware formats on every new version release

Page 17: Digital Archiving at Elsevier Joep Verheggen, ScienceDirect ICSTI Conference, London, 17 May 2004

17

““Technical aspects” – continuedTechnical aspects” – continued““Technical aspects” – continuedTechnical aspects” – continued

Software : all formats are generally accepted standards/formats (developed to last and/or easy to migrate)

• Text: full SGML, migrating to XML this year– Older content: “Head & tail” in SGML/XML

• Text: PDF (derived from Postscript file) – Older content: laser printer quality (300 dpi scanning)

• Images: TIFF, JPEG, GIF (for web applications)

• Multi-media files: we support small number of formats that will be usable in coming decades

Page 18: Digital Archiving at Elsevier Joep Verheggen, ScienceDirect ICSTI Conference, London, 17 May 2004

18

Thank you !