Transcript
Page 1: Herbarium Digitization Workshop

Herbarium Digitization Workshop

Institute for Digital Information & Scientific Communication – Florida State University 1

Database Tools & Techniques

Gil NelsonSeptember 16-18, 2012

Valdosta State University

Page 2: Herbarium Digitization Workshop

Digitizing Biological Collections

Institute for Digital Information & Scientific Communication – Florida State University 2

Herbarium Digitization Workshop

https://www.idigbio.org/content/biological-collections-databases

iDigBio’s Biological Collections Databases, Tools, and Data Publication Portals

If there is something you’d like reviewed, let us know!

(On the Wiki under Database Resources)

Page 3: Herbarium Digitization Workshop

Digitizing Biological Collections

Institute for Digital Information & Scientific Communication – Florida State University 3

Herbarium Digitization Workshop

Spread Sheets: The Scientist’s Buddy!

• Not relational (flat, not normalized)

• Has a mind of its own!• Data quality issues• Accepts various data

types in same column• Useful as a tool for

download/upload

Page 4: Herbarium Digitization Workshop

Digitizing Biological Collections

Institute for Digital Information & Scientific Communication – Florida State University 4

Herbarium Digitization Workshop

Microsoft Access

• Requires database design skills, at least at some level

• No ready-made apps• Allows form & query

development• An option if no others

exist

Page 5: Herbarium Digitization Workshop

Digitizing Biological Collections

Institute for Digital Information & Scientific Communication – Florida State University 5

Herbarium Digitization Workshop

Botanical Research and Herbarium Management SystemDepartment of Plant Sciences, University of Oxford, UK

• FoxPro Files• Mostly

European• Fairly easy

to use and setup

• Good training manual

• Links to IPNI

Page 6: Herbarium Digitization Workshop

Digitizing Biological Collections

Institute for Digital Information & Scientific Communication – Florida State University 6

Herbarium Digitization Workshop

“Build Your Own” OpenHerbarium

at FSU

Page 7: Herbarium Digitization Workshop

Digitizing Biological Collections

Institute for Digital Information & Scientific Communication – Florida State University 7

Herbarium Digitization Workshop

Page 8: Herbarium Digitization Workshop

Digitizing Biological Collections

Institute for Digital Information & Scientific Communication – Florida State University 8

Herbarium Digitization Workshop

Page 9: Herbarium Digitization Workshop

Digitizing Biological Collections

Institute for Digital Information & Scientific Communication – Florida State University 9

Herbarium Digitization Workshop

Page 10: Herbarium Digitization Workshop

Digitizing Biological Collections

Institute for Digital Information & Scientific Communication – Florida State University 10

Herbarium Digitization Workshop

Page 11: Herbarium Digitization Workshop

Digitizing Biological Collections

Institute for Digital Information & Scientific Communication – Florida State University 11

Herbarium Digitization Workshop

• Open source• Apache/IIS• PHP• Enterprise level

• Can be installed on a workstation

• Requires database knowledge and skills

Page 12: Herbarium Digitization Workshop

Digitizing Biological Collections

Institute for Digital Information & Scientific Communication – Florida State University 12

Herbarium Digitization Workshop

http://www.youtube.com/watch?v=UXvzZUlaB7I&feature=plcp

http://www.youtube.com/watch?v=faCP15wjc4g&feature=plcp

Page 13: Herbarium Digitization Workshop

Digitizing Biological Collections

Institute for Digital Information & Scientific Communication – Florida State University 13

Herbarium Digitization Workshop

Data Capture/Enrichment Techniques(See link on Wiki to Workflow Modules and Tasks: Data Capture)

Keystroking:• From images• From specimen sheets• Long vs. short (skeleton) records• May be the quickest, most efficient method, especially if recording

skeleton records

Page 14: Herbarium Digitization Workshop

Digitizing Biological Collections

Institute for Digital Information & Scientific Communication – Florida State University 14

Herbarium Digitization Workshop

Optical Character Recognition (OCR)Scanning electronic images with software designed to extract and make

readable embedded text.

OCR SoftwareABBYY Finereader 11, Corporate Converts to Word or text, single files or multiple Provides a user interface Includes batch processing options Supports training to specific data sets Relatively inexpensive Relatively easy to configure

tesseract-ocr Tesseract open source OCROriginally developed by HP in the 1980sNow owned by GoogleFocus of iDigBio OCR working group

Page 15: Herbarium Digitization Workshop

Digitizing Biological Collections

Institute for Digital Information & Scientific Communication – Florida State University 15

Herbarium Digitization Workshop

Optical Character Recognition (OCR)

Ingesting unedited OCR: Specify

Building robust searches of unedited text: VSU

Use as part of other software tools: Apiary, Symbiota

Potential Uses

tesseract-ocr

Page 16: Herbarium Digitization Workshop

Digitizing Biological Collections

Institute for Digital Information & Scientific Communication – Florida State University 16

Herbarium Digitization Workshop

HERBARIUM OF WEST GEORGIA COLLEGEAerocladium trifarium (Web.& Mohr) R.& W.Locality: SCOTLAND. Crianlarich,Mid Perth v.c. 88 flush in Cave Ardrain.Habitat:Date: July 3>19&3Collector: E .G .Wallace No.:-Altitude:VSC-L00008

Herbarium of Vatdosta Stat# CoHwg* BRITISH COLUMBIAFLORA OF CANADA Abietinella abietina (Hedw.) Fleisch.On soil in woods, near Golden.J. A. MacFadden 30 July 1928VSC-L00001

Note barcode value

Page 17: Herbarium Digitization Workshop

Apiary Project – www.apiaryproject.org - Funded by IMLS National Leadership Grant # 06-08-0079-08Botanical Research Institute of Texas / UNT TxCDK

The Apiary Project:A collaborative workflow for extraction of herbarium label dataA project of BRIT and UNT’s Texas Center for Digital Knowledge

Page 18: Herbarium Digitization Workshop

Apiary Project – www.apiaryproject.org - Funded by IMLS National Leadership Grant # 06-08-0079-08Botanical Research Institute of Texas / UNT TxCDK

Page 19: Herbarium Digitization Workshop

The Technology and Workflow

Apiary Project – www.apiaryproject.org - Funded by IMLS National Leadership Grant # 06-08-0079-08Botanical Research Institute of Texas / UNT TxCDK

Page 20: Herbarium Digitization Workshop

Apiary Project – www.apiaryproject.org - Funded by IMLS National Leadership Grant # 06-08-0079-08Botanical Research Institute of Texas / UNT TxCDK

Digitize

Page 21: Herbarium Digitization Workshop

Finding Regions of Interest

Apiary Project – www.apiaryproject.org - Funded by IMLS National Leadership Grant # 06-08-0079-08Botanical Research Institute of Texas / UNT TxCDK

Page 22: Herbarium Digitization Workshop

Transcription or OCR

Apiary Project – www.apiaryproject.org - Funded by IMLS National Leadership Grant # 06-08-0079-08Botanical Research Institute of Texas / UNT TxCDK

Page 23: Herbarium Digitization Workshop

Digitizing Biological Collections

Institute for Digital Information & Scientific Communication – Florida State University 23

Herbarium Digitization Workshop

http://vimeo.com/42586885Uploading a CSV in Salix:

Cleaned text

Salix software download: http://daryllafferty.com/salix/

Salix documentation: http://nhc.asu.edu/vpherbarium/canotia/SALIX3.pdf

These links are on the Wiki under Database Resources and Tools

Page 24: Herbarium Digitization Workshop

Digitizing Biological Collections

Institute for Digital Information & Scientific Communication – Florida State University 24

Herbarium Digitization Workshop

Voice/Speech Recognition

Dragon Naturally SpeakingNuance (now owns IBM’s ViaVoice)Mac & PCWorks better with a single user(?)~$200.00 for premium version

Speech to textTrainingBRIT project (Windows API)Included with Windows

Page 25: Herbarium Digitization Workshop

Digitizing Biological Collections

Institute for Digital Information & Scientific Communication – Florida State University 25

Herbarium Digitization Workshop

Capturing Bar Code Values

Barcode scanning• Linear• 2D• Avoid data other

than catalog number

Sync barcode value with camera-named files

Page 26: Herbarium Digitization Workshop

Digitizing Biological Collections

Institute for Digital Information & Scientific Communication – Florida State University 26

Herbarium Digitization Workshop

Capturing Bar Code Values

File re-naming at capture

BardecodefilerBCRename

Renaming files to the barcode value

FNIntercept SilveImage

Barcode values can be capture at more than one place in the workflow. Pre-digitization curation Data capture Image capture

Page 27: Herbarium Digitization Workshop

Digitizing Biological Collections

Institute for Digital Information & Scientific Communication – Florida State University 27

Thank You!

Page 28: Herbarium Digitization Workshop

Digitizing Biological Collections

Institute for Digital Information & Scientific Communication – Florida State University 28

Herbarium Digitization Workshop

Page 29: Herbarium Digitization Workshop

Digitizing Biological Collections

Institute for Digital Information & Scientific Communication – Florida State University 29

Herbarium Digitization Workshop


Recommended