Upload
chris-freeland
View
562
Download
4
Tags:
Embed Size (px)
DESCRIPTION
CBHL 2002, San Francisco
Citation preview
CBHL 2002: A Digitization Primer
A Digitization Primer for Botanical and Horticultural
Librarians
• Chris Freeland – MBG Web and Digitization Project
Coordinator
• Doug Holland – MBG Administrative Librarian
• Heather Rolen – NYBG Digitization Specialist
CBHL 2002: A Digitization Primer
Why Digitize?
• Makes resources broadly available while preserving original.
• 24/7 worldwide availability.• Capitalize on investment in resources and
technology (Collections, storage curation)• Assimilate disparate resources • Learn something new (It’s Fun!!)• Pressure from above (Everyone is doing it!)
CBHL 2002: A Digitization Primer
Survey Summary
13 Humble Responses!– Little to no experience with projects– Some with Scanning/Photoshop
• Types of materials– Slides and glass plates 6– Photos (Electrophoresis gels?) 7– Printed material [loose, bound (rare books!)],
newspaper clippings, maps, architectural drawings, seed and nursery catalogs] 10
– Herbarium Specimens 2
• Inhouse image database (Annie Malley)
CBHL 2002: A Digitization Primer
What we will be covering
• Audience and Users• Goals• Ownership• Preservation• Access • Metadata• Scanning• Sustainability
CBHL 2002: A Digitization Primer
A Framework of Guidance for Building Good Digital Collections
http://www.imls.gov/pubs/forumframework.htm
• Interoperability• Reusability (Repurposing) • Persistence• Verification• Documentation• Respecting copyright and intellectual property law• Think a little bigger and think about the future.
CBHL 2002: A Digitization Primer
Audience and Users
• Who are your users– Today– Future
• Lifelong Learners• Scholar/researcher• Students• Business Community
CBHL 2002: A Digitization Primer
Why is it important to define users?
• Guide selection process• Determines complexity and type of
metadata• Determines image resolution• Determines web-site design
(Database or exhibit format)• Determines equipment needs
CBHL 2002: A Digitization Primer
How can you retain users and keep them coming back?
• Keep adding new content
• Creating value-added content after the initial rollout– Lesson plans, etc.
• Create an e-mail newsletter
CBHL 2002: A Digitization Primer
User Comments
• Should include a way to solicit, retain, and respond to user comments and suggestions. – Can tell you if you’re reaching your intended
audience– Can provide you with wonderful comments to include in
grant proposals or to show your administration:• “Thanks so much for sharing this. This is the internet
at its best.”
• “This is fantastic. I am most enjoying these rare books, especially the illustrations. I hope to use this with teachers in the future.”
CBHL 2002: A Digitization Primer
Planning and Goals
• Have clear project goals and objectives• Be aware that funding agencies may influence
the scope of your project• Designate a project manager.• Identify key departments or staff• Stay realistic (perhaps conservative) in your
production promises. • Document all changes and evolution in your
project.
CBHL 2002: A Digitization Primer
Ownership
• Copyright needs to be considered• Holding doesn’t mean owning
• Is item in public domain? http://www.unc.edu/~unclng/public-d.htmhttp://cidc.library.cornell.edu/copyright/
• Modify your deed of gift to include digital distribution
• Controlling intellectual property after digitization
CBHL 2002: A Digitization Primer
Selection
• Audience needs• Good Collections• Condition • One or many collections or mainstreaming• Item formats and sizes • Metadata available or Collection condition
(Activities other than scanning require 75% or project time)
• Rights• Sensitive Issues (Skeletons??)• Who else is doing the same or similar items?
CBHL 2002: A Digitization Primer
Preservation and Digitization
• Digitization is NOT preservation • Do not discard originals.• Why not?
– Media longevity
– Software and hardware obsolescence
• Digitization does preserve original through reduced exposure and handling.
CBHL 2002: A Digitization Primer
Preserving the Original
• Handle Items Once (Scan high!)• Consider rehousing either before or
after scanning.• Appropriate long term storage• Remember 2/3 of project time has
nothing to do with scanning.
CBHL 2002: A Digitization Primer
Discovery and Access(or Scanned and Deliver)
• Online Catalog or Database– Subject Heading or keyword search
• Finding Aids for archival collections • Exhibit style educational page• Don’t forget metatags and visibility to
Web search engines. (If that is one of your goals!)
CBHL 2002: A Digitization Primer
Web Access and Display
• Exhibit Approach
• Database Approach
• Both
CBHL 2002: A Digitization Primer
Exhibit Approach
• Pull together text, images, maps, documents, etc. to tell a story
• Value added information enhances the scanned images
• Appealing to a wide audience
CBHL 2002: A Digitization Primer
Example of Exhibit Approach
• Private Passions, Public Legacy: Paul Mellon's Personal Library at the University of Virginia
CBHL 2002: A Digitization Primer
Database Approach
• Give access to images through a search mechanism– Generally have to know something about
the collection to find what you’re looking for
• Appealing to a more focused audience– Scholars, professionals
CBHL 2002: A Digitization Primer
Example of Database Approach
• Making of America
• Google Image Search
CBHL 2002: A Digitization Primer
Both Approaches
• Provide value added information to reach a wider audience
• Also give full access to the data for people who know what they want to view.
CBHL 2002: A Digitization Primer
Example – MBG Rare Book Site
CBHL 2002: A Digitization Primer
Design vs. Development
• Usually spend too much time discussing background colors and layout– Too subjective
• Should focus on– Search engine placement– Successful searches for key phrases– Usage statistics
CBHL 2002: A Digitization Primer
“If you build it, they may not come”
• Indexing by search engines is not a given
• Great images + great metadata does not equal a popular site
• You must consider how search engines work
CBHL 2002: A Digitization Primer
Indexing tips
CBHL 2002: A Digitization Primer
Indexing tips – <meta> tag
• <meta name="description" content="The Missouri Botanical Garden Library presents its Rare Book Digitization Project.">• <meta name="keywords" content="botanical illustration,rare books, herbals, engravings, illustrations, botany, botanical
illustrations, medicinal plants, Desktop Wallpaper, images of medicinal plants, plant images, Jaume, Kohler">• <META NAME="DC.Title" CONTENT="Plate 1 - Cinchona officinalis; <i>Cinchona officinalis</i> L.; quinine">• <LINK REL=SCHEMA.dc HREF="http://purl.org/metadata/dublin_core_elements#title">• <META NAME="DC.Creator" CONTENT="Lambert, Aylmer Bourke">• <LINK REL=SCHEMA.dc HREF="http://purl.org/metadata/dublin_core_elements#creator">• <META NAME="DC.Subject" CONTENT="(SCHEME=LCSH) Cinchona.|Hyaenanche.|Rubiaceae.|Euphorbiaceae.|Graphic media : --
Copper engraving -- Uncolored -- 1797 -- England.|">• <LINK REL=SCHEMA.dc HREF="http://purl.org/metadata/dublin_core_elements#subject">• <META NAME="DC.Subject" CONTENT="(SCHEME=LCCS) QK495 .F270 L35 1797">• <LINK REL=SCHEMA.dc HREF="http://purl.org/metadata/dublin_core_elements#subject">• <META NAME="DC.Description" CONTENT="Plate 1 - Cinchona officinalis; <i>Cinchona officinalis</i> L.; quinine">• <LINK REL=SCHEMA.dc HREF="http://purl.org/metadata/dublin_core_elements#description">• <META NAME="DC.Publisher" CONTENT="Missouri Botanical Garden">• <LINK REL=SCHEMA.dc HREF="http://purl.org/metadata/dublin_core_elements#publisher">• <META NAME="DC.Contributor.CorporateName" CONTENT="Missouri Botanical Garden">• <LINK REL=SCHEMA.dc HREF="http://purl.org/metadata/dublin_core_elements#contributor">• <META NAME="DC.Date" CONTENT="(SCHEME=ISO8601)1998-10-01">• <LINK REL=SCHEMA.dc HREF="http://purl.org/metadata/dublin_core_elements#date">• <META NAME="DC.Type" CONTENT="Image.Illustration">• <LINK REL=SCHEMA.dc HREF="http://purl.org/metadata/dublin_core_elements#type">• <META NAME="DC.Format" CONTENT="(SCHEME=IMT) text/html">• <LINK REL=SCHEMA.dc HREF="http://purl.org/metadata/dublin_core_elements#format">• <LINK REL=SCHEMA.imt HREF="http://sunsite.auc.dk/RFC/rfc/rfc2046.html">• <META NAME="DC.Identifier" CONTENT="http://ridgwaydb.mobot.org/mobot/rarebooks?
referencenumber=QK495F270L351797">• <LINK REL=SCHEMA.dc HREF="http://purl.org/metadata/dublin_core_elements#identifier">• <META NAME="DC.Language" CONTENT="(SCHEME=ISO639-1) en">• <LINK REL=SCHEMA.dc HREF="http://purl.org/metadata/dublin_core_elements#language">• <META NAME="DC.Relation" CONTENT="QK495F270L351797">• <LINK REL=SCHEMA.dc HREF="http://purl.org/metadata/dublin_core_elements#relation">
CBHL 2002: A Digitization Primer
Indexing tips - <title> tag
• Use descriptive <title> tags:
– <title>MBG Rare Books: Plate 1 - Cinchona officinalis</title>
CBHL 2002: A Digitization Primer
Indexing tips - <body> text
• Use text in your page:
– A Description of the Genus Cinchona by Lambert, Aylmer Bourke
– Description of Page: Plate 1 - Cinchona officinalis (Cinchona officinalis L., quinine)
CBHL 2002: A Digitization Primer
More indexing tips
• Having key phrase in all 3 (<meta>, <title>, and body text) increases your search engine rank
• Indexing robots follow links on pages– They will follow the hierarchy of your site
• Robots don’t:– Click on buttons– Use dropdown menus– Natively navigate or index Flash/multimedia
content
CBHL 2002: A Digitization Primer
Case Study: Köhler’s Medizinal Pflanzen
• Published 1883 – 1914
• Digitized in 1997
• Images were heavily edited and cropped
• Text was added to images
CBHL 2002: A Digitization Primer
Case Study: Köhler’s Medizinal Pflanzen
• Created static HTML pages with links through site
• Created a list of current botanical names with links to illustration
• NOT technically sophisticated
• Used an Exhibit Approach
CBHL 2002: A Digitization Primer
Case Study: Köhler’s Medizinal Pflanzen
• Receive more user feedback and image requests for this site than any other
• Reasons:– Popular content with interesting images– Has been online for several years– Simple web display that can be indexed
by all search engines
CBHL 2002: A Digitization Primer
Lessons learned
• DON’T:– spend too much time bickering over
color schemes, fonts, and layout
– confuse users and indexing robots with irregular navigation
– ignore importance of search engine results for your content
CBHL 2002: A Digitization Primer
Lessons learned
• DO:– spend time creating rich <meta> and
<title> tags and body text
– Learn how search engines index content
– Consider display, but focus on development
CBHL 2002: A Digitization Primer
Metadata and Electronic Resources
• Vast amount of information, increasing at a faster rate than is manageable
• Standards developing and evolving, using best practices
• Web enabled search engines—many, varied in retrieval success
• Everyone’s a publisher, everyone’s a librarian• HTML Metatags structure and content limited,
inhibits reliable searching• Lack of subject rich terms
CBHL 2002: A Digitization Primer
Metadata and Standards
• Metadata definition: data about data; data that aids in identification, description and location of networked resources
• Standard Generalized Mark-up Language (SGML)--1986– Structure for producing documents– Document Type Definition (DTD) created for
each type of material or individual publication– SGML’s support of encoding text AND
description of document in the header
CBHL 2002: A Digitization Primer
Dublin Core Basics
• http://purl.oclc.org/dc/• How it began• Why it is important
– Simple to create– Easy to understand– International – Flexible
• Descriptive, Structural and Administrative metadata• All elements repeatable, all optional
CBHL 2002: A Digitization Primer
Dublin Core Elements
• Title• Creator• Publisher• Contributor• Description• Identifier• Date• Format
• Subject terms/classification
• Rights Management• Source• Type• Language• Relation• Coverage
CBHL 2002: A Digitization Primer
How MBG uses DC for a book
• Title: Icones pictae plantarum rariorum descriptionibus et observationibus illustratae / Auctore J.E. Smith, M.D. Fasc. 1-3.
• Creator: Smith, James Edward• Subject_LCSH: Botany -- Pictorial works.• Subject_LCCS: QK98 .S657• Description: 2 p.l., 18 numb. 1. : 18 col. pl. ; 50 cm.• Publisher: London, 1790-93, Missouri Botanical Garden• Contributor: Photography and Web design by Debbie Windus.• Date: 1998-09-01• Identifier:
http://ridgwaydb.mobot.org/mobot/rarebooks/title.asp?relation=QK98S657
• Relation: QK98S657• Rights:
http://ridgwaydb.mobot.org/mobot/rarebooks/copyright.asp
CBHL 2002: A Digitization Primer
How MBG uses DC for a page/image
• Title: QK495F270L351797_0060.jpg• Creator: Lambert, Aylmer Bourke, 1761-1842
Subject: Cinchona.|Hyaenanche.|Rubiaceae.|Euphorbiaceae.|Graphic media : --Copper engraving -- Uncolored -- 1797 -- England.|
• Description: Plate 9 - Cinchona angustifolia• Publisher: Missouri Botanical Garden• Contributor: Missouri Botanical Garden• Date: 1998-10-01• Type: Image• Format: jpeg• Identifier: 0060• Source: QK495.F270 L35 1797
CBHL 2002: A Digitization Primer
Subject Access
• Controlled vocabularies
– Vocabularies and thesauri– Taxonomies– Access
CBHL 2002: A Digitization Primer
XML
• METADATA– descriptive
– facilitate discovery• OAI• MARC• EAD• Dublin Core
– administrative– identify/manage/preserve digital object(s) over time
• info on where pieces reside• info on how to view digital object• info on scanning process
XML
CBHL 2002: A Digitization Primer
CBHL 2002: A Digitization Primer
XML
• METADATA cont.– structural
– storage/presentation of digital object(s)• METS (metadata encoding and transmission standard)
» http://www.loc.gov/standards/mets
• TEI (text encoding initiative) http://www.tei-c.org• TEI for Libraries (5 levels of encoding)• http://www.indiana.edu/~letrs/tei/
• METAe -automatic metadata creation • http://meta-e.uibk.ac.at
CBHL 2002: A Digitization Primer
XML
• SGML/HTML/XML– Standard Generalized Markup Language (1986)– Hypertext Markup Language (1989)– eXtensible Markup Language (1996)
• XML– a document markup language for defining
structured information– a language used by computers to define hidden
information about the structure of a document
CBHL 2002: A Digitization Primer
CBHL 2002: A Digitization Primer
XML
• XML cont. -best of both worlds– storage
• can store any kind of structured info/not limited to Web delivery
– presentation• flexible development/design
CBHL 2002: A Digitization Primer
XML
• XML is a lot simpler than SGML and is sometimes described as an 80/20 solution: you get 80% of the power of SGML for 20% of the effort
• You can use XML without thinking ahead and make up your elements en route as long as they nest within each other. This is called writing "well-formed" rather than "valid" XML. Purists discourage this but people will do it anyhow.
• XML is specifically designed to work easily with the Web. – http://facultyweb.at.nwu.edu/english/mmueller/ariadne/teixintro/
index.htm
CBHL 2002: A Digitization Primer
XML
• XML and NYBG digitization project
Public use
Public access server
XML text files
Images
GSDL software suite
CBHL 2002: A Digitization Primer
XML
• XML/NYBG project– lack of adopted standards– nature of the data– delivery mechanisms
• Research!
CBHL 2002: A Digitization Primer
CBHL 2002: A Digitization Primer
XML
• XML sites– http://www.oasis-open.org/cover/sgml-xml.html– http://www.w3.org/XML/– http://www.ucc.ie/xml/#exec– http://www.xml.com/
• SGML sites– http://www.oasis-open.org/cover/general.html– http://www.w3.org/MarkUp/SGML/
• Listservs– http://sunsite.berkeley.edu/XML4Lib/– http://www.oasis-open.org/cover/lists.html
CBHL 2002: A Digitization Primer
Scanning
• Principles for Scanning
• Access (not preservation)
• Storage• Outsource options
CBHL 2002: A Digitization Primer
Howard Besser’s Principles
• Scan at the highest resolution appropriate to the informational content of the originals
• Scan at an appropriate level of quality to avoid rescanning and re-handling of the originals in the future--scan once
• Create and store a master image file that can be used to produce derivative image files and serve a variety of current and future user needs
• Use system components that are non-proprietary
CBHL 2002: A Digitization Primer
Besser’s Principles Cont.
• Use image file formats and compression techniques that conform to industry standards
• Create backup copies of all files on a stable medium
• Create meaningful metadata for image files or collections
• Store media in an appropriate environment • Monitor and recopy data as necessary • Outline a migration strategy for transferring data
across generations of technology • Anticipate and plan for future technological
developments
CBHL 2002: A Digitization Primer
Scan Basics
• Digital formats—Master/Archival, access, thumbnail
• Always keep a facsimile master• Minimum recommended
standards-NARA/LC/CPD• Hardware requirements:
– Scanner that exceeds your standards– Workstation—At least Pentium III, 650mhz,
storage (20+gigabyte)– Server for display and archiving
CBHL 2002: A Digitization Primer
MBG Imaging Lab Specs
• See handout
CBHL 2002: A Digitization Primer
Scanning
• Your requirements may be different than the accepted norm– Maybe 600 dpi is too low for your project
• Should be aware of generally accepted guidelines– Have to know the rules before you break
them
CBHL 2002: A Digitization Primer
Scanning Guidelines
• Review handout
CBHL 2002: A Digitization Primer
Scanning
• Software—Scanners come with some basic software, Adobe Photoshop Lite
• Keep current on software• Physical facilities for scanning• When to outsource/special materials
CBHL 2002: A Digitization Primer
Outsourcing
• What?– Contract work to service providers – Off-site, on-site, imaging only, image/content
display/management provider, ASP (application service provider)
• Why?– Factors to consider
• Project size • project expectations • staff size
CBHL 2002: A Digitization Primer
CBHL 2002: A Digitization Primer
Outsourcing
• Why? Cont.• staff expertise • available resources (funding for staff training and
equipment, physical space)• deadlines
CBHL 2002: A Digitization Primer
Outsourcing
• NYBG/Mellon Digitization Project– 3 titles from RB collection– conservation efforts necessary– 21 month grant, no lab, no allocated space to
build lab, no staff, no expertise, no extra funding for equipment or staff training, project expectations (grant stipulates archival quality imaging, hard deadline)
– image capture outsourced to east coast vendor, quality checks performed in-house
CBHL 2002: A Digitization Primer
CBHL 2002: A Digitization Primer
Outsourcing
• Weighing the pros and cons– fragile/rare materials under supervised control
vs. equipment costs and updates/staff/expertise/time/ physical space
• Worth consideration– …”For digitization projects, institutions and service providers
are working with developing technologies and a new vocabulary, creating new quality and production benchmarks, and trying to determine best practices. All the while, digital technology continues to evolve. Both parties must collaborate to determine capture requirements, costs, and deliverables;
manage the process; and agree on criteria.” -Meg Bellinger, President, Preservation Resources, Moving Theory into Practice, 2000.
CBHL 2002: A Digitization Primer
CBHL 2002: A Digitization Primer
Outsourcing
• Vendors– Octavo http://www.octavo.com/
– Systems Integration Group http://www.sigi.com/
– Preservation Resources http://www.oclc.org/oclc/presres/
– Saztec http://www.saztec.com
– Innodata http://www.innodata.com/
– Northern Micrographics http://www.normicro.com/northern_micrographics.htm
CBHL 2002: A Digitization Primer
CBHL 2002: A Digitization Primer
Sustainability
• Digitization shouldn’t be a fling, (when others are paying the bills) It is a marriage and more.
• Time = Money • Permanence • Data Migration and Emulation• Review and schedule upgrades• Documentation
CBHL 2002: A Digitization Primer
Cost
• Not cheap, but consider the value of objects, the investment already made on your collections and your organizational mission .
• Prices range from $7 - $35 per image• Most projects are funded on soft money.
Attempt to incorporate scanning into normal operating budgets.
• Scanning is 1/3 of total cost. • Largest cost is in research and time invested
in creation of metadata or organization of collections.
CBHL 2002: A Digitization Primer
Staffing
• Staff with tolerance for ambiguity• Staff with creativity• Training in metadata, scanning• Photographic skills (artistic eye)
microcomputer skills, web design skills• Staff with risk taking attitude
CBHL 2002: A Digitization Primer
Concluding Thoughts
• Create digital products worth preserving• Collaborate!• Adhere to standards• Refresh/migrate your data• Don’t forget preservation metadata-
digital products are not copies, but new artifacts