Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection

Preview:

Citation preview

DRI Training: Preparing Your Collection for DRI2. Digitising Your Collection

Digital Imaging – Introduction, components, process.

Tim Keefe, Head of Digital Resources and Imaging Services, Trinity College Dublin

keefet@tcd.ie

Questions we all need to ask ??

When beginning a digitization project it is easy to ignore the basic questions, those questions that we all assume we know the answers to … however these questions are often the most important, and need to addressed formally.

Questions to ask

What is the purpose of this project? What is the scope of the digitization

activity? What is the intended lifetime of the

digital files? Who is the intended audience?

Purpose What is the purpose of this project?

Why are we digitizing the material Need/Trend

Access Research Education

Who are the champions for this project Local External

Who or what are the barriers to the implementation of this project Human Resource Procedural/Political

Scope What is the scope of the digitization activity

What is to be digitized What is not to be digitized Why?

Who is likely to demand operation outside of these criteria

Intended Audience Who is the intended audience for the digital

resources What are their needs How will they access the material Who else will be interested

Are you prepared for a new audience (known or unknown) to self select to become the primary audience

Do you wish to prevent any audience from having access to the resources

Image Lifetime

What is the intended lifetime for the digital records This question is critical to the appropriate

development of the digitization activity Significant resource implications Significant planning implications Significant digitization process implications

So Why Digitize? Access

Electronic mediums provide the most dynamic assess Digital data structures offer the opportunity for truly dynamic new

research and educational models offering unique new capabilities to existing methodologies

Preservation Digital files designed to proper specifications can be true surrogates

for delicate source materials for all but a hand full of advanced research needs

Manipulation Non Linear Digital resources allow for easy modification to image characteristics Digital files easily cross medium boundaries providing opportunities

for new use models

Problems with digitization Pace of technological change is constantly

increasing the digital attributes bar Not human readable Lack of best practices / attribute recommendations Long term digital preservation is a newly emerging

field, solutions just beginning to emerge Much more complex than having IS Services make a

backup copy Extremely costly activity

TCO not well understood, few models

Capture for What?

In TCD we designate the capture activity on the object intent Capturing for Content

Speed and cost most important Quality less important

Capturing the Object Quality most important Meeting the needs of the researcher… researching

anything

Components The primary components of an average

imaging system: Digital capture device

Light source if not included in the capture system Optics if not included in the capture system

Color Calibration System Image Capture/Image Processing Computer

System(s) Software packages Data Storage Systems

Digital Capture Systems Scanners

Digital Capture Systems Flatbed

Reflective /transmissive capabilities Infra red dust and scratch removal systems (ICE) Linear/Tri linear or CCD systems Low productivity Inclusive of software

Digital Capture Systems Flatbed (limitations)

Works best with two-dimensional materials. Not recommended for use with fragile or tightly

bound material. Limited scan area. Very slow

Digital Capture Systems

35mm Photographic

Digital Capture Systems Digital Photographic systems

35mm format CCD / CMOS digital capture sensors Full Frame or Reduced frame sensors

1.5 to 1.33 avg. magnification values High productivity Limited resolution Limited bit depth (8-14 bit) Cost effective Good starting solution

Digital Capture Systems

Medium format (MF digital back)

Digital Capture Systems Medium format (MF digital back)

CCD sensors 6 x 4.5cm to 6 x 7cm sensor size With and with/out micro-lenses High bit depth (16bit)

High productivity High Cost Requires high level of studio photographic

experience Additional software needs. Associated Equipment also expensive

Digital Capture Systems Dedicated Book Scanning Systems

One size fits all… and all its limitations Limited source material input Material handling and support Possible automation

page turning , image management

Linear or CCD based Digital Camera based

High to very high productivity

Digital Capture Systems Dedicated Book Scanning Systems

Linear CCD based, generally with included software. (flatbed in different form factor)

Digital Capture Systems Dedicated Book Scanning Systems

Digital Camera based Robotic Scanners

Robots…Really?

Computer Technology What to buy

Image processing is one of the more intensive computing tasks

Recommendation is to buy the fastest most modern computer that you can afford right now Memory requirements are often more critical than processor

speed (multi core technology is not being fully advantaged by software yet)

Graphics Card often more important than processor Have a minimum RAM of 4x your largest file size… 8x

recommended Will cost 2-5x more than normal office computer

Computer Technology Consider the software needs of the digital capture

system you have chosen. Is software for generating the files required by your

Project Scenario or device type? Some MF camera systems require unique software

Will it be necessary to purchase additional image editing software packages (e.g. Adobe Creative Suite/ Photoshop) or file management software (Lightroom, Bridge, etc.) Many of these software packages are now subscription based

Storage Technology RAID (Redundant array of inexpensive disks)

Level 0 (striped) – Speed and performance increases Data is broken up and is written across several disks, taking

advantage of multiple writing heads to improve data throughput (often used for video processing)

Level 1 (mirrored) – Security through redundancy Data is identically written to more than one disk, allowing

for backup protection should any single disk fail The overall all data storage volume of the system is halved

when a level one raid is activated Local Hard drive (under the desk solution)

Low cost, lowest preservation (use only when required)

Digital Vocabulary

File Structure File types Compression Spatial resolution Bit Depth Dynamic range Color mode

File Types Tiff (Tagged Image File Format)

Large file size Standard format Lossless compression LZW (and lossy options)

Jpeg (Joint Photographic Experts Group) Smaller file sizes Lossy compression in most cases but newest versions

support lossless (Rarely supported) Standard format

Jpeg 2000 (Lossless and or Lossy) Multiple file sizes embedded within single digital record Emerging format (adoption very slow, caution)

File Types cont. PDF (Portable Document Format - Adobe Acrobat)

Advanced Cross Platform Compatibility Ability to support complex document generation

Text, images, notes, embedded graphics, etc, Support for advanced printing Support for sharing and dissemination

Standard file type Caution as there are a wide variety of versions and variants Digital preservation ISO standard acrobat type A files

Adoption rate very low Some believe that this standard had political / corporate influence driving

recommendation GIF

Dying file format, not recommended

File CompressionTwo basic types of compression Lossy and Lossless

Lossy Image structure is changed (damaged) by the compression

activity, but not in a perceptual way Jpeg is the most common format using lossy compression Every file save increases the damage

file conversion/save into a lossy format should always be the final step in the digitization and image processing process

Large reduction in file size

File Saving

Save Order When working with files that use or will use a

lossy compression (Jpeg) it is important that the very last step in the process is the file save

Each save recompresses the data and causes further image degradation It is best practice to work in a lossless format such

as Tiff, and save out the final Jpeg as a last step. This workflow will minimize the impact of the compression artifacts

Compression cont.

Lossless Image file structure is not changed in any way

by the compression activity The Tiff file format with LZW compression is

the most widely used lossless compression format Note, the tiff file format can be also generated with

no compression or lossy compression

Compression examples

Resolution This metric is generally stated as pixels per inch (ppi), or

the total number of individual picture elements that will fit in a 1 x 1 inch sample This is sometimes confused with dots per inch (dpi) which is a

printing specific metric Spatial resolution requires dimensional measurements and

ppi sample rate Screen resolution is 72 ppi (newest technology screens now

exceeding 125ppi) High resolution commercial printing requires 300-650 ppi image

files General internet jpg files 72-150ppi

Bit Depth

Bit depth is the number of samples provided within each image channel (RGB, CMYK) This term is often confused with dynamic range

They are not the same however there is an interaction between them

The number of discrete steps between black and white

Bit Depth

Bit depth is stated in the number of bits of data per channel Bit depth is 2 (binary measure) raised to the power of

the bit depth number so 4 bit color will have 16 steps between the black and white values

** note that bit depth is stated in either the number of bits per channel as in 8 bit color or by the sum of all the channels combined (R+G+B) = 24bit color… this can be confusing

Bit Depth 8 bits per channel (or 24 bit color)

256 value steps in each channel 16.8 million possible colors

16 bit per channel (or 48 bit color) 65536 value steps in each channel 281.5 trillion possible colors

Many manufacturers talk about interim bit depths (12- 14), but the final output is often reduced to 8 bits per channel you cannot add missing data by moving to a higher bit depth

Dynamic range Dynamic range is the ability of a sensor to

simultaneously capture dark detail, and light detail This is an inherent weakness of digital capture

Decisions are made to set device to support either a greater tonal range of dark densities(more common) or light

Commonly confused with bit depth They are separate characteristics despite all the contrary

information out there (much of it from reputable sources)… I promise

Greater bit depth will not automatically provide greater Dynamic Range (however improvements in bit depth often accompany other sensor improvements that include increased DR)

Dynamic Range Clipping

Clipping is a failure state of a digital image as the limited dynamic range of a device is unable to correctly capture either very light or very dark tones

Color Mode

RGB (Red/Green/Blue color channels) Additive color Most common color mode for digital images Mimics human visual system

Color Mode CMYK (Cyan/Magenta/Yellow/Black)

Subtractive color Commercial Printing standard

Most desktop color printers support RGB color files (CMYK conversion is internally managed)

Limited color gamut

Color Mode Lab color

Single luminance (grey scale channel) and 2 opposing color channels

Loosely represents the range of human vision

Good for transforms

Color Profile Standards The user defined color profile assigned to the

image files supports several informal standard configurations

sRGB Profile developed more than a decade ago by HP and

Microsoft. Represents the Gamut of an average CRT monitor Very Limited color palette New output devices currently capable of exceeding this space Most commonly used profile (usually the default if not stated)

Color Profile Standards Adobe RGB 1998

Newer profile designed to support wider palette of colors to support higher quality printing Lower use than sRGB, but well recognized Maintains a color appearance consistent with sRGB devices

ProPhoto RGB A wide gamut color space designed for very high quality printing of

photographic images Color appearance is highly inconsistent when use with devices not color

managed, or set to sRGB standards Despite the benefits of this color space, its use is quite limited due to the setup

and management requirements Caution in its use, as inaccurate color characteristics can occur with

improperly managed devices

Image Processing

Post capture modifications and manipulations to the original digital

image file structure

The Controversy

Two primary schools of thought The digital master image files should remain

untouched as they emerge from the capture device and all subsequent processing should occur only on the surrogates

Image processing will occur on the master capture file with the intent of matching the original source material as closely as possible at the time of capture

Color Mode RGB

Standard image space for files Common, not likely to change

CMYK Avoid this space for all but specific commercial printing

activities (even then try to ignore it) Lab

Great for processing transforms that can benefit from a luminance channel Sharpening Noise removal

No color profile

File Formats Master

This is the high quality large image generated from the capture device

Surrogates These are secondary files generated from the

master file to be used for specific purposes

File format Sets Master

Tiff This is intended to be the highest quality image Represents the asset derived from the € spent Lossless compression recommended

Compressed Jpg’s File size reduced for easier management, and

dissemination, and to manage costs Lossy compression is acceptable within the use cases Often several sizes (Large, small, thumbnail) Used for public display

Image Manipulations

Tone Scale To adjust tone scale you need to push or pull

predetermined black and white values to defined positions on the histogram This requires the use of a calibrated reference target placed

within the image

Image Manipulations Sharpening

Sharpening works by increasing the contrast between edges in an image. This change in contrast fools the human visual system into believing that the image is sharper

Image Manipulation

Sharpening

Cropping Cropping

Cropping is the permanent removal of unwanted parts of the image Formally determine where the boarders of your images

should be For research purposes the entire page should be represented For access and content related scanning cropping to the

textural areas of the page may be desired Failure modes

What determines a crop or image capture that is unacceptable requiring reprocessing or a new capture

Formalize this

Skew/Rotation

Skew/Rotation When the source material is not perpendicular

to the edges of the digital image Failure mode

Determine what percent is unacceptable Formalize this criteria

White Balance White balance is a color balancing function used

to address the color differences imparted by varying light sources. The human visual system does this automatically in the

brain, removing the real color cast imparted by source illuminant and giving us the perception that most lights are white.

Think of the differences evident when you have a desktop incandescent bulb in a room lit by fluorescent This is also important in the environment where your image

processing occurs

White Balance Most white balance is preset within the capture system, however fine

tuning or custom profiles can be applied in the processing stage Neutral 18% grey references are used to generate a custom balance When adjusting tone scale in Photoshop, neutral grey adjustment can be

used to correct White Balance inconsistencies

Quality Control/Assurance

Imaging and image processing are a highly repetitive, human dependent set of processes

and are therefore highly susceptible to regular error

Control vs. Assurance

Control is in process activities to ensure quality in the creation of the products ( digital images)

Assurance is focused on an evaluation of the processes used and generally takes place outside of the creation process

Quality Control

Processes built into the imaging work flow to ensure that the creation of digital images is Consistent Accurate Repeatable

Often automated these processes are inherently part of the imaging workflow

Quality Assurance

The Quality Assurance Audit Formal.. Informal just does not work Existing toolsets developed for a variety of

manufacturing based industries are highly effective TQM Six Sigma Etc.

Takes place fully outside of the imaging processes

Quality Assurance Testing

What to test for Imaging

File structure metrics Naming, page counts System/Network (positioning, backup

etc.) Metadata

Structure Accuracy Completeness

Color Management

One of the most critical, and often ignored, components of a successful digitization project is a well planned

color management strategy

Color Management

Within any imaging and processing system you need to ensure that consistent color is displayed from device to device, and that a files color metrics are electronically recognized

Technology Required Capture reference targets Color profiles / icc

Color Reference Targets

Allows a formal measured reverence to be associated with the image (future proofing)

Color Management Technology Color meters (Basic screen calibration)

Absorptive measurements Less dynamic than Spectrophotometers

Spectrophotometers (Advanced CM) Can measure the intensity of light as a

function of the wavelength of the light Light absorption Diffuse Specular

CM Standards ICC (international color consortium)

Works through a standardized Color Matching Module (CMM) connection space

Not an ideal solution, but one that has been very well adopted by most imaging related hardware and software vendors

ColorSync (Apple Computer) Apple solution to color management Part of the Macintosh system software Generally plays well with others, occasionally some

fiddling is necessary (ICC integrated) Hands off approach

Further Reading and Resources DRI and Digital File Format Choices Factsheet:

http://dri.ie/sites/default/files/files/dri-factsheets-file-formats.pdf DRI Long-Term Digital Preservation Factsheet:

http://tinyurl.com/hbp28xe Online Resources for Digitisation Projects:

http://dri.ie/digitisation-resources- includes resources for Project Planning, File Formats, Audio

& Audiovisual, Hardware, Metadata & Vocabularies and Policy. Trinity College Dublin Digital Collections Repository:

https://www.tcd.ie/Library/dris/digital.php