12
BIODIVERSITY Digitizing Herbarium Sheets World Class Rapid Digitization of Cultural Heritage

BIODIVERSITY - picturae.com · 5 Picturae is very proud of these accomplish- ments, in which we tapped into the multi-disciplinary experience of our staff, based on a decade of sophisticated

  • Upload
    doannga

  • View
    214

  • Download
    1

Embed Size (px)

Citation preview

Page 1: BIODIVERSITY - picturae.com · 5 Picturae is very proud of these accomplish- ments, in which we tapped into the multi-disciplinary experience of our staff, based on a decade of sophisticated

BIODIVERSITYDigitizing Herbarium Sheets

World Class Rapid Digitization of Cultural Heritage

Page 2: BIODIVERSITY - picturae.com · 5 Picturae is very proud of these accomplish- ments, in which we tapped into the multi-disciplinary experience of our staff, based on a decade of sophisticated

332

DIGITIZING

HERBARIUM SHEETS

Getting the material

Page 3: BIODIVERSITY - picturae.com · 5 Picturae is very proud of these accomplish- ments, in which we tapped into the multi-disciplinary experience of our staff, based on a decade of sophisticated

33

Having processed over 8 million herbarium sheets by the end of 2016,

Picturae has approximately doubled the number of herbarium sheet

images available world-wide in 3 years, and we are ready for the challenge

ahead! It’s estimated there are well over 350 million herbarium sheets in the

world and that number is growing by the day. We are determined to see it all

digitized in our lifetime, how about you?

DIGITIZING

HERBARIUM SHEETS

With one of the largest natural history collections in

the world, Naturalis Biodiversity Center is a centre of

excellence in biodiversity research. In 2012 Naturalis

published a request for tender for the digitization and

transcription of their collection of herbarium sheets

within a very short timeframe. It was clear from the start

that if Picturae wanted to succeed we had to come up

with an innovative and efficient solution.

This led to the idea of an industrialized approach;

to standardize and automate the process wherever

possible.

The project started in August 2013 with the target of

digitizing millions of herbarium sheets and making

millions of data entries within a mere 18 months.

The key requirements of this project were: careful

handling of the herbarium sheets, working within

a guaranteed plan that was easy to scale

up and the constant

delivery of high quality images and label information

with a 100% guarantee on consistency and

completeness. A new digitization system was built with

a conveyor belt as the core element. The staff carefully

manipulated the objects to position them to be

captured, and returned them to the collection in the

same order. After meticulous consideration of the

different stages of the workflow it became possible to

automate everything except the physical handling of the

objects.

Picturae set up a digitization effort that has yet to

be matched. Approximately five million herbarium

sheets were digitized within nine months, all of which

were individually validated and accounted for.

The temporary studio had three conveyor belt systems

running continuously and up to 200,000 sheets were

processed in a single week, all with the same attention

and care in handling. u

Getting the material

Apply Barcode

Page 4: BIODIVERSITY - picturae.com · 5 Picturae is very proud of these accomplish- ments, in which we tapped into the multi-disciplinary experience of our staff, based on a decade of sophisticated

55

Picturae is very proud of these accomplish-

ments, in which we tapped into the multi-

disciplinary experience of our staff, based

on a decade of sophisticated digitization of

cultural heritage collections worldwide. We

have learned many lessons and

continue to do so, as every herbarium

project seems to pose new challenges that

we are pleased to meet head on!

Changes in the herbariums are, however,

on a much larger scale when these vast

collections suddenly become instantly

available to scientists and the public.

No more time needs to be spent on daily

digitization tasks, instead staff are able to

focus on the core tasks of doing research

and making discoveries. In a climate in

which loans are decreasing, a digital

collection makes it possible to pinpoint a

specimen of interest and to compare

it with others with relative ease.

New discoveries are made within the

collections and sharing the information has

never been so easy! Big-data analysis is just

around the corner. Search engines give new

ways of accessing the collection,

and statistics have never been more

reliable thanks to the number of samples

available. Picturae is extremely proud

that we have helped to open these

doors and will continue to fine-tune

and develop solutions that provide the

economical means of bringing the massive

biodiversity collections into the digital age.

And the rest, as they say, is history, if it

weren’t that our story continues! Since

the Naturalis project set a new standard

for herbarium digitization we have had the

privilege of working on collections from all

over the world. There is so much more to

come!

Picturae has digitized, or is presently in

the process of digitizing, herbariums based

in the Netherlands, the United States of

America, the United Kingdom, France,

Belgium, Norway and Germany.

4

Preperation

Place material on conveyor belt Digitizing Processing Packing1 2 3 4 5 Metadata

entry6

TARGET

Operating screen

• Read barcode• Multisheet yes/no• Apply ICC profile• Readout color• Readout sharpness• Feedback retake

• Order remains in tact• Logistic management• Return material

• Spread on the conveyor belt• Multisheet token• Straighten• Apply sheet barcode

• Cover & label discription• Look-up lists• Linking to databases• Multisheet processing

• Selection boxes• Extract vapour• Apply box barcode• Apply cover barcode

• Rotating• Cropping• Readout target• Multisheet colorcode• Merge metadata to CSV file format• Save deliverables

Workflow

Placement on conveyor belt

Positioning

Page 5: BIODIVERSITY - picturae.com · 5 Picturae is very proud of these accomplish- ments, in which we tapped into the multi-disciplinary experience of our staff, based on a decade of sophisticated

55

Preperation

Place material on conveyor belt Digitizing Processing Packing1 2 3 4 5 Metadata

entry6

TARGET

Operating screen

• Read barcode• Multisheet yes/no• Apply ICC profile• Readout color• Readout sharpness• Feedback retake

• Order remains in tact• Logistic management• Return material

• Spread on the conveyor belt• Multisheet token• Straighten• Apply sheet barcode

• Cover & label discription• Look-up lists• Linking to databases• Multisheet processing

• Selection boxes• Extract vapour• Apply box barcode• Apply cover barcode

• Rotating• Cropping• Readout target• Multisheet colorcode• Merge metadata to CSV file format• Save deliverables

Positioning

Multisheet yes/no

Page 6: BIODIVERSITY - picturae.com · 5 Picturae is very proud of these accomplish- ments, in which we tapped into the multi-disciplinary experience of our staff, based on a decade of sophisticated

76

Industrial Digitization

The digitization software has been designed with

automatic failsafe procedures to prevent technical

defects to the digital images. Firstly, the hardware needs

to be calibrated to ensure the system is

performing according to agreed specifications.

To calibrate the system, it is necessary to have both

the parameters and the bandwidth in which they can

operate. These parameters are provided by the Dutch

Metamorfoze guidelines and the FADGI guidelines from

the Library of Congress in the US; both sets

of guidelines have been developed to test the

performance of the digitization equipment.

Technical targets are used to analyze colour accuracy,

sharpness, resolution, noise, tonal distribution, etc. and

dedicated software is used to measure the targets and

provide an objective result.

The specimens in a herbarium collection clearly have

3D characteristics. For this reason we have developed a

target to measure the depth of field (DoF).

This is done by measuring slanted edge targets fixed at

different heights. If the results are within the bandwidth

we then know that the system will provide a totally

sharp image over a depth of field range of 4 centimetres.

This is sufficient for most specimens mounted on a

herbarium sheet, and even the biggest specimen will still

give a sharp image.

These targets have been chosen either because they use

ISO standards and/or because they are widely

accepted by the digital imaging community.

The use of technical targets has always been a

standard part of our workflow, but with the advance of

software programs to objectively measure them,

other opportunities have presented themselves.

A web based analyzing tool was developed by

Picturae to facilitate checking the system’s

Picturae has gained extensive experience from the digitization of cultural heritage

collections over more than fifteen years. Digitization systems are custom built to suit a range

of originals. One of the biggest challenges is to prevent errors which can only be solved by

making a retake. A retake, that is a new digital image of the original object, becomes necessary

when the first image has been rejected by our internal quality control or the client’s quality

control. This may be the result of a technical error deriving from the equipment or due to a

mistake post-processing. This process of retaking is time-consuming because it is usually not

a standardized operation. To develop a workflow process which will prevent errors was one of

the biggest challenges.

Placing multisheet coin

7

Page 7: BIODIVERSITY - picturae.com · 5 Picturae is very proud of these accomplish- ments, in which we tapped into the multi-disciplinary experience of our staff, based on a decade of sophisticated

7

performance, easily and efficiently. This technology was

incorporated in the new workflow software for the

herbarium sheet digitization system. Every day before

starting production, run targets are captured and

analyzed to ensure that the system is still performing

according to the parameters. Production can only begin

after this first phase is successful.

Another measure of control is implemented by

shooting an ‘object level target’ in every image.

This target is analyzed every time an image has been

captured. If one of the parameters is rejected the

software will stop the system. The conveyor belt is

transported back until the rejected item is in the right

position to be digitized again. All images that were

digitized after this are deleted and the process will start

again from the new digital image. By building in this

safety procedure we are able to ensure that the files

which are going to be processed are technically sound

and will not be rejected by the client’s quality control. In

principle this is the key to the Picturae

herbarium solution: validation of every individual

specimen image.

Other errors by which images can be rejected in the

quality control phase of a digitization project are caused

by file naming errors and post processing

errors. To prevent file naming errors every scanned item

has to have a unique identifier. For the

digitization of the Naturalis herbarium sheets we were

permitted to fix data matrix barcode labels on the sheets

giving each sheet a unique identifier. The data matrix

labels were also applied to the box and the coversheets

to keep track of the contents and box order. The

hierarchy of these elements is retained

in the metadata,

allowing this metadata to be transferred from one

location to another, so, for example, the information on a

folder is automatically transferred to the sheets inside it.

These data matrix labels in turn can be detected and

read by the workflow software. If there is no data

matrix detected in the image it has no identifier and

therefore cannot be saved. Again the conveyor belt

will move backwards to position the item correctly in

order to apply a data matrix label. After the item has

been captured and approved (it has a data matrix and

all the technical parameters are in accordance with the

specifications) the file undergoes post processing. A

colour profile is applied, the item is cropped and rotated,

the herbarium logo and target with scale are added to

the image and all the necessary metadata is saved in the

header of the file. If there are no mistakes in the post

processing stage, the file is then ready for delivery to

the herbarium.

Placing multisheet coin

Carefull positioning

Depth of field and sharpness readout

7

Page 8: BIODIVERSITY - picturae.com · 5 Picturae is very proud of these accomplish- ments, in which we tapped into the multi-disciplinary experience of our staff, based on a decade of sophisticated

98

The transcription of biodiversity labels is no easy

task. Most are handwritten and the various pieces of

information are scattered over different parts of the

label or over a multitude of labels. We know from the

experience of transcribing almost 3 million labels for

Naturalis over 18 months, that it takes up to 2 months to

train a new transcriber to do this on their own.

In contrast, we have had an extremely positive

experience of presenting difficult transcription tasks to

the general public over the last 5 years through means

of crowdsourcing. This has led us to believe that it must

be possible to harness the power of the crowd to input

biodiversity data. Crowdsourcing could successfully be

employed for simple tasks such as indexing the cover

sheet data, interconnecting data, describing the sheet

and adding geographical references. The platform could,

however, also be used as a tool for the

herbarium staff themselves, or for a group of

herbariums working together on a specific project.

As the users can work on the web-based platform

using their own computer, time and place isn’t an

issue. There is, of course, a quality assurance level built

in to approve, or correct, the work.

The crowdsourcing initiative that Picturae has

developed with the City Archive of Amsterdam in

2011 is unequalled; a group of 8000 people have

successfully transcribed over 4 million records!

We haven’t just used crowdsourcing for simple

transcription, there have been many dozens of

projects involving image-tagging, georeferencing and

data-linking. We successfully and quickly completed

a project with microscopic glass-slides for Naturalis,

which was extremely popular with the crowd. We are

looking forward to the next biodiversity project using

this contemporary method, getting people involved and

harnessing their knowledge.

It would be our pleasure to give you a demonstration

and let you try it out. We think you will be pleasantly

surprised by the power it holds and the sense of

community present.

Quality• Care for the original is paramount; all handling is done by trained staff,

including holding, barcoding and replacing.

• QR coding direction for both digitization and data entry (and future use of

the images on prints).

• The system used has been engineered and built from requirements by a team dedicated to herbarium sheets

digitization, using over a decade of experience in digitizing delicate heritage originals.

• Photographic system with flash exposure, Metamorfoze quality reproduction, validated with

www.Delt.ae and the FADGI standard.

• Foolproof interface with visual progress presentation.

• Validated large sharpness depth of >40 mm (suitable for the thickest herbarium specimen).

• Redundant digital infrastructure: integrated back-up monitored remotely via the internet.

• ISO 9001:2008 certified (international quality standard covering company processes), ISO/IEC 27001:2005

(information security management system) and ISO 14001:2004 (good environmental management).

Crowdsourcing

Apply sheet barcode

999

Page 9: BIODIVERSITY - picturae.com · 5 Picturae is very proud of these accomplish- ments, in which we tapped into the multi-disciplinary experience of our staff, based on a decade of sophisticated

9

When getting ready for digitization it is possible that

collections are not ready for digitization because they

are not mounted. For herbarium collection holders this

can be a time consuming task.

Picturae and Grahal started to work together on the

French e-ReColNat project, a consortium to bring

natural history collections online. Grahal is responsible

for the mounting of 1.5 million specimens in three years.

With a team of 35 professionals about 2.000 specimens

per day are mounted. In 2008 Grahal already mounted

1 million specimens for the Muséum National d’Histoire

Naturel in Paris. Within both projects reversible

techniques are used: the specimens are mounted with

strips of gummed paper and/or the specimens are sewed

with needle and thread.

For each project or even (sub)collection the Grahal

project team will set up a protocol taking into account all

specific needs of a collection. Both reversible and non-

reversible techniques can be part of the protocol. During

the mounting process services for label interpretation

and barcoding can be included.

Herbarium specimen mounting

Apply sheet barcode

Digitisation

Durable, affordable storageManaging a herbarium collection in many cases represents the need for a huge amount of digital storage. Picturae can safely store your files in our two environmentally-friendly and secure storage areas. By entrusting this work to Picturae, it is no longer necessary for you to take responsibility for the maintenance, security and software updates that storage and hosting entails.

999

Page 10: BIODIVERSITY - picturae.com · 5 Picturae is very proud of these accomplish- ments, in which we tapped into the multi-disciplinary experience of our staff, based on a decade of sophisticated

1111

Transcription Process

10

For rapid and high quality transcription a dedicated application DETA has been developed by Alembo. DETA has built-in correction rules, and can work with any look-up tables that are available.

Some collections are already partially transcribed in existing systems, and if preferred, the transcription can be done within that system. For instance, the Naturalis project had been done with the Rapid Data Entry tool of the BRAHMS collection software.

Technical Systems

11

Processing

Original order

Online availability

u

When digitizing millions of herbarium sheets it is almost

impossible, or at least extremely expensive and time

consuming, to have the herbarium staff transcribe the

labels.

Picturae and Alembo joined forces to work on the

Naturalis project and have served many other

herbariums since then. Alembo administers a

transcription team that types the cover sheet data

(i.e. geographic region and taxon name) and label

information into a data collection system.

The data can be combined with, or in, the images on the

sheets, as the transcription requirements are

different for every herbarium. Picturae provides the

delivery after quality assurance.

Alembo has become the worldwide professional

transcriber of herbarium sheets. With a speed of 60

sheets per hour, and with a team of 60

professionals, a

collection of 3 million sheets can be professionally and

affordably transcribed in less than a year.

The quality of our transcription is considered to be more

consistent and at botanist level quality due

to the unbiased and productive approach of the

transcribers. Due to the massive experience of the staff

they will pick out anomalies in labels and deal with them

efficiently. It is safe to say that the senior staff at Alembo

have seen and processed more herbarium labels than

any botanist would want to do in a lifetime!

Page 11: BIODIVERSITY - picturae.com · 5 Picturae is very proud of these accomplish- ments, in which we tapped into the multi-disciplinary experience of our staff, based on a decade of sophisticated

1111

Presentation and Collection Management• Optimal search possibilities: the digitization of originals makes it easier to search

through a collection. If digital files are stored in our collection management system

Memorix Maior, it becomes possible to provide additional metadata. The system

also offers the option of interlinking files, such as multiple sheets found by the same

collector.

• High quality online viewer: it is possible to display either a part, or the entire

collection, on a public website. Students, biologists and other interested parties can

thus easily view the collection whether they are at home or on the road, on their

laptop, smartphone or tablet.

• Enrich your content: you can use your metadata to provide a map which shows where

the herbarium sheets were found in a specific region in a specific period. You can easily

visualize where different species were found, when and by whom.

In order to ensure high and consistent quality the transcription is executed in close cooperation with the herbarium and

with Picturae. The process is under full supervision of the herbarium and consists of the following steps:

• The transcription interpretation is agreed. The goal is to ensure consistency with the existing herbarium sheets

information held in current databases.

• A first batch is transcribed and thoroughly reviewed by all parties. In our experience this allows us to reach a more

consistent interpretation.

• The interpretation rules are improved, and transcription begins with a small professional team of transcribers. The

results are reviewed thoroughly, feedback is given and where necessary the interpretation rules are again modified.

• The team and the interpretation rules have now been established, reviewed and ensured to be consistent.

The transcription team is scaled up to meet the scheduled target dates.

• During the transcription process reliable sample tests are performed in order to ensure the quality and

consistency. The whole process is under full review. The end result is a consistent collection of herbarium sheets

that can be accessed worldwide.

FinalizedBatchesSamples

Feedback

Scale UpReviewFollowingBatches

UpdateInterpretaion

RulesReviewFirst

BatchesAgree on Rules

andInterpretation

11

Online availability

Page 12: BIODIVERSITY - picturae.com · 5 Picturae is very proud of these accomplish- ments, in which we tapped into the multi-disciplinary experience of our staff, based on a decade of sophisticated

More information: T +31 (0)72 - 53 20 444 [email protected] www.picturae.com

Integration

Search API(JSON)

Component API

ArchiveKitchen components: Herbarium Media library Webexpositions

(data, layout and design)

MemorixCollection

managementDAM

Digital Asset Management

Enrichment

MemorixArchives

Databases

Adlib

Thesauri AAT, Geonames, DBpedia, (linked) OpenData, Own knowledge sources

Indexing

EMU

DMS EAD (XML)

Joomla OpenDataDrupal OAI-PMH

World Class Rapid Digitization of Cultural Heritage

More information: T +31 (0)72 - 53 20 444 [email protected] www.digitalherbarium.com