66
DRI Training: Preparing Your Collection for DRI 2. Digitising Your Collection Digital Imaging – Introduction, components, process. Tim Keefe, Head of Digital Resources and Imaging Services, Trinity College Dublin [email protected]

Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Embed Size (px)

Citation preview

Page 1: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

DRI Training: Preparing Your Collection for DRI2. Digitising Your Collection

Digital Imaging – Introduction, components, process.

Tim Keefe, Head of Digital Resources and Imaging Services, Trinity College Dublin

[email protected]

Page 2: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Questions we all need to ask ??

When beginning a digitization project it is easy to ignore the basic questions, those questions that we all assume we know the answers to … however these questions are often the most important, and need to addressed formally.

Page 3: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Questions to ask

What is the purpose of this project? What is the scope of the digitization

activity? What is the intended lifetime of the

digital files? Who is the intended audience?

Page 4: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Purpose What is the purpose of this project?

Why are we digitizing the material Need/Trend

Access Research Education

Who are the champions for this project Local External

Who or what are the barriers to the implementation of this project Human Resource Procedural/Political

Page 5: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Scope What is the scope of the digitization activity

What is to be digitized What is not to be digitized Why?

Who is likely to demand operation outside of these criteria

Page 6: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Intended Audience Who is the intended audience for the digital

resources What are their needs How will they access the material Who else will be interested

Are you prepared for a new audience (known or unknown) to self select to become the primary audience

Do you wish to prevent any audience from having access to the resources

Page 7: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Image Lifetime

What is the intended lifetime for the digital records This question is critical to the appropriate

development of the digitization activity Significant resource implications Significant planning implications Significant digitization process implications

Page 8: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

So Why Digitize? Access

Electronic mediums provide the most dynamic assess Digital data structures offer the opportunity for truly dynamic new

research and educational models offering unique new capabilities to existing methodologies

Preservation Digital files designed to proper specifications can be true surrogates

for delicate source materials for all but a hand full of advanced research needs

Manipulation Non Linear Digital resources allow for easy modification to image characteristics Digital files easily cross medium boundaries providing opportunities

for new use models

Page 9: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Problems with digitization Pace of technological change is constantly

increasing the digital attributes bar Not human readable Lack of best practices / attribute recommendations Long term digital preservation is a newly emerging

field, solutions just beginning to emerge Much more complex than having IS Services make a

backup copy Extremely costly activity

TCO not well understood, few models

Page 10: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Capture for What?

In TCD we designate the capture activity on the object intent Capturing for Content

Speed and cost most important Quality less important

Capturing the Object Quality most important Meeting the needs of the researcher… researching

anything

Page 11: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Components The primary components of an average

imaging system: Digital capture device

Light source if not included in the capture system Optics if not included in the capture system

Color Calibration System Image Capture/Image Processing Computer

System(s) Software packages Data Storage Systems

Page 12: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Digital Capture Systems Scanners

Page 13: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Digital Capture Systems Flatbed

Reflective /transmissive capabilities Infra red dust and scratch removal systems (ICE) Linear/Tri linear or CCD systems Low productivity Inclusive of software

Page 14: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Digital Capture Systems Flatbed (limitations)

Works best with two-dimensional materials. Not recommended for use with fragile or tightly

bound material. Limited scan area. Very slow

Page 15: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Digital Capture Systems

35mm Photographic

Page 16: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Digital Capture Systems Digital Photographic systems

35mm format CCD / CMOS digital capture sensors Full Frame or Reduced frame sensors

1.5 to 1.33 avg. magnification values High productivity Limited resolution Limited bit depth (8-14 bit) Cost effective Good starting solution

Page 17: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Digital Capture Systems

Medium format (MF digital back)

Page 18: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Digital Capture Systems Medium format (MF digital back)

CCD sensors 6 x 4.5cm to 6 x 7cm sensor size With and with/out micro-lenses High bit depth (16bit)

High productivity High Cost Requires high level of studio photographic

experience Additional software needs. Associated Equipment also expensive

Page 19: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Digital Capture Systems Dedicated Book Scanning Systems

One size fits all… and all its limitations Limited source material input Material handling and support Possible automation

page turning , image management

Linear or CCD based Digital Camera based

High to very high productivity

Page 20: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Digital Capture Systems Dedicated Book Scanning Systems

Linear CCD based, generally with included software. (flatbed in different form factor)

Page 21: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Digital Capture Systems Dedicated Book Scanning Systems

Digital Camera based Robotic Scanners

Page 22: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Robots…Really?

Page 23: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Computer Technology What to buy

Image processing is one of the more intensive computing tasks

Recommendation is to buy the fastest most modern computer that you can afford right now Memory requirements are often more critical than processor

speed (multi core technology is not being fully advantaged by software yet)

Graphics Card often more important than processor Have a minimum RAM of 4x your largest file size… 8x

recommended Will cost 2-5x more than normal office computer

Page 24: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Computer Technology Consider the software needs of the digital capture

system you have chosen. Is software for generating the files required by your

Project Scenario or device type? Some MF camera systems require unique software

Will it be necessary to purchase additional image editing software packages (e.g. Adobe Creative Suite/ Photoshop) or file management software (Lightroom, Bridge, etc.) Many of these software packages are now subscription based

Page 25: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Storage Technology RAID (Redundant array of inexpensive disks)

Level 0 (striped) – Speed and performance increases Data is broken up and is written across several disks, taking

advantage of multiple writing heads to improve data throughput (often used for video processing)

Level 1 (mirrored) – Security through redundancy Data is identically written to more than one disk, allowing

for backup protection should any single disk fail The overall all data storage volume of the system is halved

when a level one raid is activated Local Hard drive (under the desk solution)

Low cost, lowest preservation (use only when required)

Page 26: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Digital Vocabulary

File Structure File types Compression Spatial resolution Bit Depth Dynamic range Color mode

Page 27: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

File Types Tiff (Tagged Image File Format)

Large file size Standard format Lossless compression LZW (and lossy options)

Jpeg (Joint Photographic Experts Group) Smaller file sizes Lossy compression in most cases but newest versions

support lossless (Rarely supported) Standard format

Jpeg 2000 (Lossless and or Lossy) Multiple file sizes embedded within single digital record Emerging format (adoption very slow, caution)

Page 28: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

File Types cont. PDF (Portable Document Format - Adobe Acrobat)

Advanced Cross Platform Compatibility Ability to support complex document generation

Text, images, notes, embedded graphics, etc, Support for advanced printing Support for sharing and dissemination

Standard file type Caution as there are a wide variety of versions and variants Digital preservation ISO standard acrobat type A files

Adoption rate very low Some believe that this standard had political / corporate influence driving

recommendation GIF

Dying file format, not recommended

Page 29: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

File CompressionTwo basic types of compression Lossy and Lossless

Lossy Image structure is changed (damaged) by the compression

activity, but not in a perceptual way Jpeg is the most common format using lossy compression Every file save increases the damage

file conversion/save into a lossy format should always be the final step in the digitization and image processing process

Large reduction in file size

Page 30: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

File Saving

Save Order When working with files that use or will use a

lossy compression (Jpeg) it is important that the very last step in the process is the file save

Each save recompresses the data and causes further image degradation It is best practice to work in a lossless format such

as Tiff, and save out the final Jpeg as a last step. This workflow will minimize the impact of the compression artifacts

Page 31: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Compression cont.

Lossless Image file structure is not changed in any way

by the compression activity The Tiff file format with LZW compression is

the most widely used lossless compression format Note, the tiff file format can be also generated with

no compression or lossy compression

Page 32: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Compression examples

Page 33: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Resolution This metric is generally stated as pixels per inch (ppi), or

the total number of individual picture elements that will fit in a 1 x 1 inch sample This is sometimes confused with dots per inch (dpi) which is a

printing specific metric Spatial resolution requires dimensional measurements and

ppi sample rate Screen resolution is 72 ppi (newest technology screens now

exceeding 125ppi) High resolution commercial printing requires 300-650 ppi image

files General internet jpg files 72-150ppi

Page 34: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Bit Depth

Bit depth is the number of samples provided within each image channel (RGB, CMYK) This term is often confused with dynamic range

They are not the same however there is an interaction between them

The number of discrete steps between black and white

Page 35: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Bit Depth

Bit depth is stated in the number of bits of data per channel Bit depth is 2 (binary measure) raised to the power of

the bit depth number so 4 bit color will have 16 steps between the black and white values

** note that bit depth is stated in either the number of bits per channel as in 8 bit color or by the sum of all the channels combined (R+G+B) = 24bit color… this can be confusing

Page 36: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Bit Depth 8 bits per channel (or 24 bit color)

256 value steps in each channel 16.8 million possible colors

16 bit per channel (or 48 bit color) 65536 value steps in each channel 281.5 trillion possible colors

Many manufacturers talk about interim bit depths (12- 14), but the final output is often reduced to 8 bits per channel you cannot add missing data by moving to a higher bit depth

Page 37: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Dynamic range Dynamic range is the ability of a sensor to

simultaneously capture dark detail, and light detail This is an inherent weakness of digital capture

Decisions are made to set device to support either a greater tonal range of dark densities(more common) or light

Commonly confused with bit depth They are separate characteristics despite all the contrary

information out there (much of it from reputable sources)… I promise

Greater bit depth will not automatically provide greater Dynamic Range (however improvements in bit depth often accompany other sensor improvements that include increased DR)

Page 38: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Dynamic Range Clipping

Clipping is a failure state of a digital image as the limited dynamic range of a device is unable to correctly capture either very light or very dark tones

Page 39: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Color Mode

RGB (Red/Green/Blue color channels) Additive color Most common color mode for digital images Mimics human visual system

Page 40: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Color Mode CMYK (Cyan/Magenta/Yellow/Black)

Subtractive color Commercial Printing standard

Most desktop color printers support RGB color files (CMYK conversion is internally managed)

Limited color gamut

Page 41: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Color Mode Lab color

Single luminance (grey scale channel) and 2 opposing color channels

Loosely represents the range of human vision

Good for transforms

Page 42: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Color Profile Standards The user defined color profile assigned to the

image files supports several informal standard configurations

sRGB Profile developed more than a decade ago by HP and

Microsoft. Represents the Gamut of an average CRT monitor Very Limited color palette New output devices currently capable of exceeding this space Most commonly used profile (usually the default if not stated)

Page 43: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Color Profile Standards Adobe RGB 1998

Newer profile designed to support wider palette of colors to support higher quality printing Lower use than sRGB, but well recognized Maintains a color appearance consistent with sRGB devices

ProPhoto RGB A wide gamut color space designed for very high quality printing of

photographic images Color appearance is highly inconsistent when use with devices not color

managed, or set to sRGB standards Despite the benefits of this color space, its use is quite limited due to the setup

and management requirements Caution in its use, as inaccurate color characteristics can occur with

improperly managed devices

Page 44: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Image Processing

Post capture modifications and manipulations to the original digital

image file structure

Page 45: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

The Controversy

Two primary schools of thought The digital master image files should remain

untouched as they emerge from the capture device and all subsequent processing should occur only on the surrogates

Image processing will occur on the master capture file with the intent of matching the original source material as closely as possible at the time of capture

Page 46: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Color Mode RGB

Standard image space for files Common, not likely to change

CMYK Avoid this space for all but specific commercial printing

activities (even then try to ignore it) Lab

Great for processing transforms that can benefit from a luminance channel Sharpening Noise removal

No color profile

Page 47: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

File Formats Master

This is the high quality large image generated from the capture device

Surrogates These are secondary files generated from the

master file to be used for specific purposes

Page 48: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

File format Sets Master

Tiff This is intended to be the highest quality image Represents the asset derived from the € spent Lossless compression recommended

Compressed Jpg’s File size reduced for easier management, and

dissemination, and to manage costs Lossy compression is acceptable within the use cases Often several sizes (Large, small, thumbnail) Used for public display

Page 49: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Image Manipulations

Tone Scale To adjust tone scale you need to push or pull

predetermined black and white values to defined positions on the histogram This requires the use of a calibrated reference target placed

within the image

Page 50: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Image Manipulations Sharpening

Sharpening works by increasing the contrast between edges in an image. This change in contrast fools the human visual system into believing that the image is sharper

Page 51: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Image Manipulation

Sharpening

Page 52: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Cropping Cropping

Cropping is the permanent removal of unwanted parts of the image Formally determine where the boarders of your images

should be For research purposes the entire page should be represented For access and content related scanning cropping to the

textural areas of the page may be desired Failure modes

What determines a crop or image capture that is unacceptable requiring reprocessing or a new capture

Formalize this

Page 53: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Skew/Rotation

Skew/Rotation When the source material is not perpendicular

to the edges of the digital image Failure mode

Determine what percent is unacceptable Formalize this criteria

Page 54: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

White Balance White balance is a color balancing function used

to address the color differences imparted by varying light sources. The human visual system does this automatically in the

brain, removing the real color cast imparted by source illuminant and giving us the perception that most lights are white.

Think of the differences evident when you have a desktop incandescent bulb in a room lit by fluorescent This is also important in the environment where your image

processing occurs

Page 55: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

White Balance Most white balance is preset within the capture system, however fine

tuning or custom profiles can be applied in the processing stage Neutral 18% grey references are used to generate a custom balance When adjusting tone scale in Photoshop, neutral grey adjustment can be

used to correct White Balance inconsistencies

Page 56: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Quality Control/Assurance

Imaging and image processing are a highly repetitive, human dependent set of processes

and are therefore highly susceptible to regular error

Page 57: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Control vs. Assurance

Control is in process activities to ensure quality in the creation of the products ( digital images)

Assurance is focused on an evaluation of the processes used and generally takes place outside of the creation process

Page 58: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Quality Control

Processes built into the imaging work flow to ensure that the creation of digital images is Consistent Accurate Repeatable

Often automated these processes are inherently part of the imaging workflow

Page 59: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Quality Assurance

The Quality Assurance Audit Formal.. Informal just does not work Existing toolsets developed for a variety of

manufacturing based industries are highly effective TQM Six Sigma Etc.

Takes place fully outside of the imaging processes

Page 60: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Quality Assurance Testing

What to test for Imaging

File structure metrics Naming, page counts System/Network (positioning, backup

etc.) Metadata

Structure Accuracy Completeness

Page 61: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Color Management

One of the most critical, and often ignored, components of a successful digitization project is a well planned

color management strategy

Page 62: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Color Management

Within any imaging and processing system you need to ensure that consistent color is displayed from device to device, and that a files color metrics are electronically recognized

Technology Required Capture reference targets Color profiles / icc

Page 63: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Color Reference Targets

Allows a formal measured reverence to be associated with the image (future proofing)

Page 64: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Color Management Technology Color meters (Basic screen calibration)

Absorptive measurements Less dynamic than Spectrophotometers

Spectrophotometers (Advanced CM) Can measure the intensity of light as a

function of the wavelength of the light Light absorption Diffuse Specular

Page 65: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

CM Standards ICC (international color consortium)

Works through a standardized Color Matching Module (CMM) connection space

Not an ideal solution, but one that has been very well adopted by most imaging related hardware and software vendors

ColorSync (Apple Computer) Apple solution to color management Part of the Macintosh system software Generally plays well with others, occasionally some

fiddling is necessary (ICC integrated) Hands off approach

Page 66: Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Further Reading and Resources DRI and Digital File Format Choices Factsheet:

http://dri.ie/sites/default/files/files/dri-factsheets-file-formats.pdf DRI Long-Term Digital Preservation Factsheet:

http://tinyurl.com/hbp28xe Online Resources for Digitisation Projects:

http://dri.ie/digitisation-resources- includes resources for Project Planning, File Formats, Audio

& Audiovisual, Hardware, Metadata & Vocabularies and Policy. Trinity College Dublin Digital Collections Repository:

https://www.tcd.ie/Library/dris/digital.php