Author
matt
View
78
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Herbarium Digitization Workshop. Database Tools & Techniques. Gil Nelson September 16-18, 2012 Valdosta State University. Institute for Digital Information & Scientific Communication – Florida State University. Digitizing Biological Collections. Herbarium Digitization Workshop. - PowerPoint PPT Presentation
Herbarium Digitization Workshop
Institute for Digital Information & Scientific Communication – Florida State University 1
Database Tools & Techniques
Gil NelsonSeptember 16-18, 2012
Valdosta State University
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 2
Herbarium Digitization Workshop
https://www.idigbio.org/content/biological-collections-databases
iDigBio’s Biological Collections Databases, Tools, and Data Publication Portals
If there is something you’d like reviewed, let us know!
(On the Wiki under Database Resources)
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 3
Herbarium Digitization Workshop
Spread Sheets: The Scientist’s Buddy!
• Not relational (flat, not normalized)
• Has a mind of its own!• Data quality issues• Accepts various data
types in same column• Useful as a tool for
download/upload
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 4
Herbarium Digitization Workshop
Microsoft Access
• Requires database design skills, at least at some level
• No ready-made apps• Allows form & query
development• An option if no others
exist
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 5
Herbarium Digitization Workshop
Botanical Research and Herbarium Management SystemDepartment of Plant Sciences, University of Oxford, UK
• FoxPro Files• Mostly
European• Fairly easy
to use and setup
• Good training manual
• Links to IPNI
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 6
Herbarium Digitization Workshop
“Build Your Own” OpenHerbarium
at FSU
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 7
Herbarium Digitization Workshop
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 8
Herbarium Digitization Workshop
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 9
Herbarium Digitization Workshop
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 10
Herbarium Digitization Workshop
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 11
Herbarium Digitization Workshop
• Open source• Apache/IIS• PHP• Enterprise level
• Can be installed on a workstation
• Requires database knowledge and skills
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 12
Herbarium Digitization Workshop
http://www.youtube.com/watch?v=UXvzZUlaB7I&feature=plcp
http://www.youtube.com/watch?v=faCP15wjc4g&feature=plcp
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 13
Herbarium Digitization Workshop
Data Capture/Enrichment Techniques(See link on Wiki to Workflow Modules and Tasks: Data Capture)
Keystroking:• From images• From specimen sheets• Long vs. short (skeleton) records• May be the quickest, most efficient method, especially if recording
skeleton records
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 14
Herbarium Digitization Workshop
Optical Character Recognition (OCR)Scanning electronic images with software designed to extract and make
readable embedded text.
OCR SoftwareABBYY Finereader 11, Corporate Converts to Word or text, single files or multiple Provides a user interface Includes batch processing options Supports training to specific data sets Relatively inexpensive Relatively easy to configure
tesseract-ocr Tesseract open source OCROriginally developed by HP in the 1980sNow owned by GoogleFocus of iDigBio OCR working group
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 15
Herbarium Digitization Workshop
Optical Character Recognition (OCR)
Ingesting unedited OCR: Specify
Building robust searches of unedited text: VSU
Use as part of other software tools: Apiary, Symbiota
Potential Uses
tesseract-ocr
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 16
Herbarium Digitization Workshop
HERBARIUM OF WEST GEORGIA COLLEGEAerocladium trifarium (Web.& Mohr) R.& W.Locality: SCOTLAND. Crianlarich,Mid Perth v.c. 88 flush in Cave Ardrain.Habitat:Date: July 3>19&3Collector: E .G .Wallace No.:-Altitude:VSC-L00008
Herbarium of Vatdosta Stat# CoHwg* BRITISH COLUMBIAFLORA OF CANADA Abietinella abietina (Hedw.) Fleisch.On soil in woods, near Golden.J. A. MacFadden 30 July 1928VSC-L00001
Note barcode value
Apiary Project – www.apiaryproject.org - Funded by IMLS National Leadership Grant # 06-08-0079-08Botanical Research Institute of Texas / UNT TxCDK
The Apiary Project:A collaborative workflow for extraction of herbarium label dataA project of BRIT and UNT’s Texas Center for Digital Knowledge
Apiary Project – www.apiaryproject.org - Funded by IMLS National Leadership Grant # 06-08-0079-08Botanical Research Institute of Texas / UNT TxCDK
The Technology and Workflow
Apiary Project – www.apiaryproject.org - Funded by IMLS National Leadership Grant # 06-08-0079-08Botanical Research Institute of Texas / UNT TxCDK
Apiary Project – www.apiaryproject.org - Funded by IMLS National Leadership Grant # 06-08-0079-08Botanical Research Institute of Texas / UNT TxCDK
Digitize
Finding Regions of Interest
Apiary Project – www.apiaryproject.org - Funded by IMLS National Leadership Grant # 06-08-0079-08Botanical Research Institute of Texas / UNT TxCDK
Transcription or OCR
Apiary Project – www.apiaryproject.org - Funded by IMLS National Leadership Grant # 06-08-0079-08Botanical Research Institute of Texas / UNT TxCDK
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 23
Herbarium Digitization Workshop
http://vimeo.com/42586885Uploading a CSV in Salix:
Cleaned text
Salix software download: http://daryllafferty.com/salix/
Salix documentation: http://nhc.asu.edu/vpherbarium/canotia/SALIX3.pdf
These links are on the Wiki under Database Resources and Tools
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 24
Herbarium Digitization Workshop
Voice/Speech Recognition
Dragon Naturally SpeakingNuance (now owns IBM’s ViaVoice)Mac & PCWorks better with a single user(?)~$200.00 for premium version
Speech to textTrainingBRIT project (Windows API)Included with Windows
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 25
Herbarium Digitization Workshop
Capturing Bar Code Values
Barcode scanning• Linear• 2D• Avoid data other
than catalog number
Sync barcode value with camera-named files
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 26
Herbarium Digitization Workshop
Capturing Bar Code Values
File re-naming at capture
BardecodefilerBCRename
Renaming files to the barcode value
FNIntercept SilveImage
Barcode values can be capture at more than one place in the workflow. Pre-digitization curation Data capture Image capture
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 27
Thank You!
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 28
Herbarium Digitization Workshop
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 29
Herbarium Digitization Workshop