Upload
amice-malone
View
216
Download
0
Embed Size (px)
Citation preview
Digital Archiving at ElsevierDigital Archiving at ElsevierDigital Archiving at ElsevierDigital Archiving at Elsevier
Joep Verheggen, ScienceDirectICSTI Conference, London, 17 May 2004
2
AgendaAgendaAgendaAgenda
• Short introduction about Elsevier
• Archiving; why is this so important and what is our position
• “YOAS” project• “Technical aspects”
Note: this presentation focusses on journal content
3
Elsevier vision...Elsevier vision...Elsevier vision...Elsevier vision...
…to deliver superior information products and services that provide solutions for scientists, medical professionals and librarians ...
4
Archiving terminologyArchiving terminologyArchiving terminologyArchiving terminology
• there can be confusion when talking of archives between:– (1) ongoing access to current services
and– (2) long-term storage and preservation
of the intellectual content
• we provide for both in our licenses • this presentation primarily related
to (2)
5
Long-term preservationLong-term preservationLong-term preservationLong-term preservation
• significance of going “e-only”• many university and corporate
libraries have cancelled paper and use electronic only -- and this is increasing weekly
• e-only puts greater pressure on archival preservation -- and archiving of both the print and the electronic versions
• archiving high on the agenda of individual libraries and library groups
6
Responsibility for archivingResponsibility for archivingResponsibility for archivingResponsibility for archiving
• Elsevier takes digital archiving seriously– responsibility to authors– responsibility for maintaining
“the minutes of science”– importance to the library
community– interest in maintaining an
asset
7
Broad range of actionsBroad range of actionsBroad range of actionsBroad range of actions
• have participated in discussions, projects and committees related to digital archiving since 1995
• among the first (after AIP) to make public archiving commitment and perhaps the first to incorporate it in our license
• currently making multi-million dollar investment in internal back-up systems
8
Current license languageCurrent license languageCurrent license languageCurrent license language
• since 1999, all ScienceDirect licenses for online service contains an annex specifying:– we will maintain a permanent
archive of the SD journals we own– we will migrate the archive as the
technology used for storage or access changes
– we will transfer the archive to an independent, librarian-approved depository if we cannot maintain it
9
Sizing the problemSizing the problemSizing the problemSizing the problem
• there are more than 1800 Elsevier journals on ScienceDirect
• we are retrodigitizing: creating digital backfiles from v. 1, n. 1 on all titles
• expect to have more than 6 million articles on ScienceDirect by the end of this year
• original size estimate of total file: 50 million pages, 6.5 to 7 terabytes
• Project started in 2001, completed in 2004
10
Types of archives Types of archives Types of archives Types of archives
• internal production “archive” Electronic Warehouse, not ScienceDirect
• “defacto archives” about 10 regular ScienceDirect OnSite (SDOS) customers worldwide who get everything or nearly everything for local loading (but make no archiving commitment beyond their constituency)
11
Types of archives -- continuedTypes of archives -- continuedTypes of archives -- continuedTypes of archives -- continued
• self-designated “national” archives libraries or other institutions that choose to maintain an archival copy locally as a national security measure; variation on SDOS license
• “official Elsevier archive” formal, contractual relationship between Elsevier and a trusted archival institution to provide permanent retention and access to the digital files for future generations
12
Official Elsevier archivesOfficial Elsevier archivesOfficial Elsevier archivesOfficial Elsevier archives
• we did an investigative project with Yale University Library (with funding from the Mellon Foundation) which was completed in early 2002
• signed the first formal agreement for an official archive with the Koninklijke Bibliotheek (KB) in August, 2002
• likely to do 3-4 additional agreements (in North America, Asia and Europe)
13
Koninklijke BibliotheekKoninklijke BibliotheekKoninklijke BibliotheekKoninklijke Bibliotheek
• an recognized international leader in digital archiving investigations
• fortunately, also our national library
• Elsevier was already sending electronic files for its 351 Dutch imprint journals
• now expand to the entire 1,800 title journal list, which the KB will archive “forever”
14
Official archive contract termsOfficial archive contract termsOfficial archive contract termsOfficial archive contract terms
• contract is different from a normal license for SD– perpetual nature of an archive– service level agreement– trigger events -- public access– financial terms– format for submission– comprehensiveness of archive (e.g.,
handling of “withdrawn” material)• as standards for archival
repositories develop, KB must meet these
15
Use of the official archivesUse of the official archivesUse of the official archivesUse of the official archives
• available for walk-in users now• available remotely to anyone in
the event we exit the business and no one else takes over
• in the event of a disaster that would result in ScienceDirect being down for a prolonged period, all libraries holding the journals (archives or SDOS) would be invited to open access to all (no access controls)
16
““Technical aspects”; LOCKSS Technical aspects”; LOCKSS principle principle ““Technical aspects”; LOCKSS Technical aspects”; LOCKSS principle principle Hardware
• Dayton hosting system is located in a bunker that is Tornado-, Earthquake-, and aircraft impact proof
• Daily incremental backups, weekly complete backups
• Off-site copies of backups, extensive recovery procedures in place
• Migration to new type hardware formats on every new version release
17
““Technical aspects” – continuedTechnical aspects” – continued““Technical aspects” – continuedTechnical aspects” – continued
Software : all formats are generally accepted standards/formats (developed to last and/or easy to migrate)
• Text: full SGML, migrating to XML this year– Older content: “Head & tail” in SGML/XML
• Text: PDF (derived from Postscript file) – Older content: laser printer quality (300 dpi scanning)
• Images: TIFF, JPEG, GIF (for web applications)
• Multi-media files: we support small number of formats that will be usable in coming decades
18
Thank you !