Upload
doannga
View
214
Download
1
Embed Size (px)
Citation preview
BIODIVERSITYDigitizing Herbarium Sheets
World Class Rapid Digitization of Cultural Heritage
332
DIGITIZING
HERBARIUM SHEETS
Getting the material
33
Having processed over 8 million herbarium sheets by the end of 2016,
Picturae has approximately doubled the number of herbarium sheet
images available world-wide in 3 years, and we are ready for the challenge
ahead! It’s estimated there are well over 350 million herbarium sheets in the
world and that number is growing by the day. We are determined to see it all
digitized in our lifetime, how about you?
DIGITIZING
HERBARIUM SHEETS
With one of the largest natural history collections in
the world, Naturalis Biodiversity Center is a centre of
excellence in biodiversity research. In 2012 Naturalis
published a request for tender for the digitization and
transcription of their collection of herbarium sheets
within a very short timeframe. It was clear from the start
that if Picturae wanted to succeed we had to come up
with an innovative and efficient solution.
This led to the idea of an industrialized approach;
to standardize and automate the process wherever
possible.
The project started in August 2013 with the target of
digitizing millions of herbarium sheets and making
millions of data entries within a mere 18 months.
The key requirements of this project were: careful
handling of the herbarium sheets, working within
a guaranteed plan that was easy to scale
up and the constant
delivery of high quality images and label information
with a 100% guarantee on consistency and
completeness. A new digitization system was built with
a conveyor belt as the core element. The staff carefully
manipulated the objects to position them to be
captured, and returned them to the collection in the
same order. After meticulous consideration of the
different stages of the workflow it became possible to
automate everything except the physical handling of the
objects.
Picturae set up a digitization effort that has yet to
be matched. Approximately five million herbarium
sheets were digitized within nine months, all of which
were individually validated and accounted for.
The temporary studio had three conveyor belt systems
running continuously and up to 200,000 sheets were
processed in a single week, all with the same attention
and care in handling. u
Getting the material
Apply Barcode
55
Picturae is very proud of these accomplish-
ments, in which we tapped into the multi-
disciplinary experience of our staff, based
on a decade of sophisticated digitization of
cultural heritage collections worldwide. We
have learned many lessons and
continue to do so, as every herbarium
project seems to pose new challenges that
we are pleased to meet head on!
Changes in the herbariums are, however,
on a much larger scale when these vast
collections suddenly become instantly
available to scientists and the public.
No more time needs to be spent on daily
digitization tasks, instead staff are able to
focus on the core tasks of doing research
and making discoveries. In a climate in
which loans are decreasing, a digital
collection makes it possible to pinpoint a
specimen of interest and to compare
it with others with relative ease.
New discoveries are made within the
collections and sharing the information has
never been so easy! Big-data analysis is just
around the corner. Search engines give new
ways of accessing the collection,
and statistics have never been more
reliable thanks to the number of samples
available. Picturae is extremely proud
that we have helped to open these
doors and will continue to fine-tune
and develop solutions that provide the
economical means of bringing the massive
biodiversity collections into the digital age.
And the rest, as they say, is history, if it
weren’t that our story continues! Since
the Naturalis project set a new standard
for herbarium digitization we have had the
privilege of working on collections from all
over the world. There is so much more to
come!
Picturae has digitized, or is presently in
the process of digitizing, herbariums based
in the Netherlands, the United States of
America, the United Kingdom, France,
Belgium, Norway and Germany.
4
Preperation
Place material on conveyor belt Digitizing Processing Packing1 2 3 4 5 Metadata
entry6
TARGET
Operating screen
• Read barcode• Multisheet yes/no• Apply ICC profile• Readout color• Readout sharpness• Feedback retake
• Order remains in tact• Logistic management• Return material
• Spread on the conveyor belt• Multisheet token• Straighten• Apply sheet barcode
• Cover & label discription• Look-up lists• Linking to databases• Multisheet processing
• Selection boxes• Extract vapour• Apply box barcode• Apply cover barcode
• Rotating• Cropping• Readout target• Multisheet colorcode• Merge metadata to CSV file format• Save deliverables
Workflow
Placement on conveyor belt
Positioning
55
Preperation
Place material on conveyor belt Digitizing Processing Packing1 2 3 4 5 Metadata
entry6
TARGET
Operating screen
• Read barcode• Multisheet yes/no• Apply ICC profile• Readout color• Readout sharpness• Feedback retake
• Order remains in tact• Logistic management• Return material
• Spread on the conveyor belt• Multisheet token• Straighten• Apply sheet barcode
• Cover & label discription• Look-up lists• Linking to databases• Multisheet processing
• Selection boxes• Extract vapour• Apply box barcode• Apply cover barcode
• Rotating• Cropping• Readout target• Multisheet colorcode• Merge metadata to CSV file format• Save deliverables
Positioning
Multisheet yes/no
76
Industrial Digitization
The digitization software has been designed with
automatic failsafe procedures to prevent technical
defects to the digital images. Firstly, the hardware needs
to be calibrated to ensure the system is
performing according to agreed specifications.
To calibrate the system, it is necessary to have both
the parameters and the bandwidth in which they can
operate. These parameters are provided by the Dutch
Metamorfoze guidelines and the FADGI guidelines from
the Library of Congress in the US; both sets
of guidelines have been developed to test the
performance of the digitization equipment.
Technical targets are used to analyze colour accuracy,
sharpness, resolution, noise, tonal distribution, etc. and
dedicated software is used to measure the targets and
provide an objective result.
The specimens in a herbarium collection clearly have
3D characteristics. For this reason we have developed a
target to measure the depth of field (DoF).
This is done by measuring slanted edge targets fixed at
different heights. If the results are within the bandwidth
we then know that the system will provide a totally
sharp image over a depth of field range of 4 centimetres.
This is sufficient for most specimens mounted on a
herbarium sheet, and even the biggest specimen will still
give a sharp image.
These targets have been chosen either because they use
ISO standards and/or because they are widely
accepted by the digital imaging community.
The use of technical targets has always been a
standard part of our workflow, but with the advance of
software programs to objectively measure them,
other opportunities have presented themselves.
A web based analyzing tool was developed by
Picturae to facilitate checking the system’s
Picturae has gained extensive experience from the digitization of cultural heritage
collections over more than fifteen years. Digitization systems are custom built to suit a range
of originals. One of the biggest challenges is to prevent errors which can only be solved by
making a retake. A retake, that is a new digital image of the original object, becomes necessary
when the first image has been rejected by our internal quality control or the client’s quality
control. This may be the result of a technical error deriving from the equipment or due to a
mistake post-processing. This process of retaking is time-consuming because it is usually not
a standardized operation. To develop a workflow process which will prevent errors was one of
the biggest challenges.
Placing multisheet coin
7
7
performance, easily and efficiently. This technology was
incorporated in the new workflow software for the
herbarium sheet digitization system. Every day before
starting production, run targets are captured and
analyzed to ensure that the system is still performing
according to the parameters. Production can only begin
after this first phase is successful.
Another measure of control is implemented by
shooting an ‘object level target’ in every image.
This target is analyzed every time an image has been
captured. If one of the parameters is rejected the
software will stop the system. The conveyor belt is
transported back until the rejected item is in the right
position to be digitized again. All images that were
digitized after this are deleted and the process will start
again from the new digital image. By building in this
safety procedure we are able to ensure that the files
which are going to be processed are technically sound
and will not be rejected by the client’s quality control. In
principle this is the key to the Picturae
herbarium solution: validation of every individual
specimen image.
Other errors by which images can be rejected in the
quality control phase of a digitization project are caused
by file naming errors and post processing
errors. To prevent file naming errors every scanned item
has to have a unique identifier. For the
digitization of the Naturalis herbarium sheets we were
permitted to fix data matrix barcode labels on the sheets
giving each sheet a unique identifier. The data matrix
labels were also applied to the box and the coversheets
to keep track of the contents and box order. The
hierarchy of these elements is retained
in the metadata,
allowing this metadata to be transferred from one
location to another, so, for example, the information on a
folder is automatically transferred to the sheets inside it.
These data matrix labels in turn can be detected and
read by the workflow software. If there is no data
matrix detected in the image it has no identifier and
therefore cannot be saved. Again the conveyor belt
will move backwards to position the item correctly in
order to apply a data matrix label. After the item has
been captured and approved (it has a data matrix and
all the technical parameters are in accordance with the
specifications) the file undergoes post processing. A
colour profile is applied, the item is cropped and rotated,
the herbarium logo and target with scale are added to
the image and all the necessary metadata is saved in the
header of the file. If there are no mistakes in the post
processing stage, the file is then ready for delivery to
the herbarium.
Placing multisheet coin
Carefull positioning
Depth of field and sharpness readout
7
98
The transcription of biodiversity labels is no easy
task. Most are handwritten and the various pieces of
information are scattered over different parts of the
label or over a multitude of labels. We know from the
experience of transcribing almost 3 million labels for
Naturalis over 18 months, that it takes up to 2 months to
train a new transcriber to do this on their own.
In contrast, we have had an extremely positive
experience of presenting difficult transcription tasks to
the general public over the last 5 years through means
of crowdsourcing. This has led us to believe that it must
be possible to harness the power of the crowd to input
biodiversity data. Crowdsourcing could successfully be
employed for simple tasks such as indexing the cover
sheet data, interconnecting data, describing the sheet
and adding geographical references. The platform could,
however, also be used as a tool for the
herbarium staff themselves, or for a group of
herbariums working together on a specific project.
As the users can work on the web-based platform
using their own computer, time and place isn’t an
issue. There is, of course, a quality assurance level built
in to approve, or correct, the work.
The crowdsourcing initiative that Picturae has
developed with the City Archive of Amsterdam in
2011 is unequalled; a group of 8000 people have
successfully transcribed over 4 million records!
We haven’t just used crowdsourcing for simple
transcription, there have been many dozens of
projects involving image-tagging, georeferencing and
data-linking. We successfully and quickly completed
a project with microscopic glass-slides for Naturalis,
which was extremely popular with the crowd. We are
looking forward to the next biodiversity project using
this contemporary method, getting people involved and
harnessing their knowledge.
It would be our pleasure to give you a demonstration
and let you try it out. We think you will be pleasantly
surprised by the power it holds and the sense of
community present.
Quality• Care for the original is paramount; all handling is done by trained staff,
including holding, barcoding and replacing.
• QR coding direction for both digitization and data entry (and future use of
the images on prints).
• The system used has been engineered and built from requirements by a team dedicated to herbarium sheets
digitization, using over a decade of experience in digitizing delicate heritage originals.
• Photographic system with flash exposure, Metamorfoze quality reproduction, validated with
www.Delt.ae and the FADGI standard.
• Foolproof interface with visual progress presentation.
• Validated large sharpness depth of >40 mm (suitable for the thickest herbarium specimen).
• Redundant digital infrastructure: integrated back-up monitored remotely via the internet.
• ISO 9001:2008 certified (international quality standard covering company processes), ISO/IEC 27001:2005
(information security management system) and ISO 14001:2004 (good environmental management).
Crowdsourcing
Apply sheet barcode
999
9
When getting ready for digitization it is possible that
collections are not ready for digitization because they
are not mounted. For herbarium collection holders this
can be a time consuming task.
Picturae and Grahal started to work together on the
French e-ReColNat project, a consortium to bring
natural history collections online. Grahal is responsible
for the mounting of 1.5 million specimens in three years.
With a team of 35 professionals about 2.000 specimens
per day are mounted. In 2008 Grahal already mounted
1 million specimens for the Muséum National d’Histoire
Naturel in Paris. Within both projects reversible
techniques are used: the specimens are mounted with
strips of gummed paper and/or the specimens are sewed
with needle and thread.
For each project or even (sub)collection the Grahal
project team will set up a protocol taking into account all
specific needs of a collection. Both reversible and non-
reversible techniques can be part of the protocol. During
the mounting process services for label interpretation
and barcoding can be included.
Herbarium specimen mounting
Apply sheet barcode
Digitisation
Durable, affordable storageManaging a herbarium collection in many cases represents the need for a huge amount of digital storage. Picturae can safely store your files in our two environmentally-friendly and secure storage areas. By entrusting this work to Picturae, it is no longer necessary for you to take responsibility for the maintenance, security and software updates that storage and hosting entails.
999
1111
Transcription Process
10
For rapid and high quality transcription a dedicated application DETA has been developed by Alembo. DETA has built-in correction rules, and can work with any look-up tables that are available.
Some collections are already partially transcribed in existing systems, and if preferred, the transcription can be done within that system. For instance, the Naturalis project had been done with the Rapid Data Entry tool of the BRAHMS collection software.
Technical Systems
11
Processing
Original order
Online availability
u
When digitizing millions of herbarium sheets it is almost
impossible, or at least extremely expensive and time
consuming, to have the herbarium staff transcribe the
labels.
Picturae and Alembo joined forces to work on the
Naturalis project and have served many other
herbariums since then. Alembo administers a
transcription team that types the cover sheet data
(i.e. geographic region and taxon name) and label
information into a data collection system.
The data can be combined with, or in, the images on the
sheets, as the transcription requirements are
different for every herbarium. Picturae provides the
delivery after quality assurance.
Alembo has become the worldwide professional
transcriber of herbarium sheets. With a speed of 60
sheets per hour, and with a team of 60
professionals, a
collection of 3 million sheets can be professionally and
affordably transcribed in less than a year.
The quality of our transcription is considered to be more
consistent and at botanist level quality due
to the unbiased and productive approach of the
transcribers. Due to the massive experience of the staff
they will pick out anomalies in labels and deal with them
efficiently. It is safe to say that the senior staff at Alembo
have seen and processed more herbarium labels than
any botanist would want to do in a lifetime!
1111
Presentation and Collection Management• Optimal search possibilities: the digitization of originals makes it easier to search
through a collection. If digital files are stored in our collection management system
Memorix Maior, it becomes possible to provide additional metadata. The system
also offers the option of interlinking files, such as multiple sheets found by the same
collector.
• High quality online viewer: it is possible to display either a part, or the entire
collection, on a public website. Students, biologists and other interested parties can
thus easily view the collection whether they are at home or on the road, on their
laptop, smartphone or tablet.
• Enrich your content: you can use your metadata to provide a map which shows where
the herbarium sheets were found in a specific region in a specific period. You can easily
visualize where different species were found, when and by whom.
In order to ensure high and consistent quality the transcription is executed in close cooperation with the herbarium and
with Picturae. The process is under full supervision of the herbarium and consists of the following steps:
• The transcription interpretation is agreed. The goal is to ensure consistency with the existing herbarium sheets
information held in current databases.
• A first batch is transcribed and thoroughly reviewed by all parties. In our experience this allows us to reach a more
consistent interpretation.
• The interpretation rules are improved, and transcription begins with a small professional team of transcribers. The
results are reviewed thoroughly, feedback is given and where necessary the interpretation rules are again modified.
• The team and the interpretation rules have now been established, reviewed and ensured to be consistent.
The transcription team is scaled up to meet the scheduled target dates.
• During the transcription process reliable sample tests are performed in order to ensure the quality and
consistency. The whole process is under full review. The end result is a consistent collection of herbarium sheets
that can be accessed worldwide.
FinalizedBatchesSamples
Feedback
Scale UpReviewFollowingBatches
UpdateInterpretaion
RulesReviewFirst
BatchesAgree on Rules
andInterpretation
11
Online availability
More information: T +31 (0)72 - 53 20 444 [email protected] www.picturae.com
Integration
Search API(JSON)
Component API
ArchiveKitchen components: Herbarium Media library Webexpositions
(data, layout and design)
MemorixCollection
managementDAM
Digital Asset Management
Enrichment
MemorixArchives
Databases
Adlib
Thesauri AAT, Geonames, DBpedia, (linked) OpenData, Own knowledge sources
Indexing
EMU
DMS EAD (XML)
Joomla OpenDataDrupal OAI-PMH
World Class Rapid Digitization of Cultural Heritage
More information: T +31 (0)72 - 53 20 444 [email protected] www.digitalherbarium.com