Herbarium Digitization Workshop

  • View
    66

  • Download
    1

Embed Size (px)

DESCRIPTION

Herbarium Digitization Workshop. Database Tools & Techniques. Gil Nelson September 16-18, 2012 Valdosta State University. Institute for Digital Information & Scientific Communication – Florida State University. Digitizing Biological Collections. Herbarium Digitization Workshop. - PowerPoint PPT Presentation

Text of Herbarium Digitization Workshop

PowerPoint Presentation

Herbarium Digitization Workshop

Institute for Digital Information & Scientific Communication Florida State University1Database Tools & Techniques

Gil NelsonSeptember 16-18, 2012Valdosta State University1Digitizing Biological Collections

Institute for Digital Information & Scientific Communication Florida State University2Herbarium Digitization Workshophttps://www.idigbio.org/content/biological-collections-databasesiDigBios Biological Collections Databases, Tools, and Data Publication PortalsIf there is something youd like reviewed, let us know!(On the Wiki under Database Resources)useful:=fitness, robustness, usability, discoverability, elucidation, new knowledge2Digitizing Biological Collections

Institute for Digital Information & Scientific Communication Florida State University3Herbarium Digitization WorkshopSpread Sheets: The Scientists Buddy!

Not relational (flat, not normalized)Has a mind of its own!Data quality issuesAccepts various data types in same columnUseful as a tool for download/uploaduseful:=fitness, robustness, usability, discoverability, elucidation, new knowledge3Digitizing Biological Collections

Institute for Digital Information & Scientific Communication Florida State University4Herbarium Digitization Workshop

Microsoft AccessRequires database design skills, at least at some levelNo ready-made appsAllows form & query developmentAn option if no others existuseful:=fitness, robustness, usability, discoverability, elucidation, new knowledge4Digitizing Biological Collections

Institute for Digital Information & Scientific Communication Florida State University5Herbarium Digitization Workshop

Botanical Research and Herbarium Management SystemDepartment of Plant Sciences, University of Oxford, UKFoxPro FilesMostly EuropeanFairly easy to use and setupGood training manualLinks to IPNIuseful:=fitness, robustness, usability, discoverability, elucidation, new knowledge5

Digitizing Biological Collections

Institute for Digital Information & Scientific Communication Florida State University6Herbarium Digitization WorkshopBuild Your Own OpenHerbarium at FSU useful:=fitness, robustness, usability, discoverability, elucidation, new knowledge6Digitizing Biological Collections

Institute for Digital Information & Scientific Communication Florida State University7Herbarium Digitization Workshop

useful:=fitness, robustness, usability, discoverability, elucidation, new knowledge7

Digitizing Biological Collections

Institute for Digital Information & Scientific Communication Florida State University8Herbarium Digitization Workshopuseful:=fitness, robustness, usability, discoverability, elucidation, new knowledge8Digitizing Biological Collections

Institute for Digital Information & Scientific Communication Florida State University9Herbarium Digitization Workshop

useful:=fitness, robustness, usability, discoverability, elucidation, new knowledge9Digitizing Biological Collections

Institute for Digital Information & Scientific Communication Florida State University10Herbarium Digitization Workshop

useful:=fitness, robustness, usability, discoverability, elucidation, new knowledge10Digitizing Biological Collections

Institute for Digital Information & Scientific Communication Florida State University11Herbarium Digitization Workshop

Open sourceApache/IISPHPEnterprise level

Can be installed on a workstationRequires database knowledge and skillsuseful:=fitness, robustness, usability, discoverability, elucidation, new knowledge11Digitizing Biological Collections

Institute for Digital Information & Scientific Communication Florida State University12Herbarium Digitization Workshophttp://www.youtube.com/watch?v=UXvzZUlaB7I&feature=plcphttp://www.youtube.com/watch?v=faCP15wjc4g&feature=plcp

useful:=fitness, robustness, usability, discoverability, elucidation, new knowledge12Digitizing Biological Collections

Institute for Digital Information & Scientific Communication Florida State University13Herbarium Digitization WorkshopData Capture/Enrichment Techniques(See link on Wiki to Workflow Modules and Tasks: Data Capture)Keystroking:From imagesFrom specimen sheetsLong vs. short (skeleton) recordsMay be the quickest, most efficient method, especially if recording skeleton records

useful:=fitness, robustness, usability, discoverability, elucidation, new knowledge13Digitizing Biological Collections

Institute for Digital Information & Scientific Communication Florida State University14Herbarium Digitization WorkshopOptical Character Recognition (OCR)Scanning electronic images with software designed to extract and make readable embedded text.OCR Software

ABBYY Finereader 11, CorporateConverts to Word or text, single files or multipleProvides a user interfaceIncludes batch processing optionsSupports training to specific data setsRelatively inexpensiveRelatively easy to configure

tesseract-ocrTesseract open source OCROriginally developed by HP in the 1980sNow owned by GoogleFocus of iDigBio OCR working groupuseful:=fitness, robustness, usability, discoverability, elucidation, new knowledge14Digitizing Biological Collections

Institute for Digital Information & Scientific Communication Florida State University15Herbarium Digitization WorkshopOptical Character Recognition (OCR)Ingesting unedited OCR: Specify

Building robust searches of unedited text: VSU

Use as part of other software tools: Apiary, SymbiotaPotential Uses

tesseract-ocr

useful:=fitness, robustness, usability, discoverability, elucidation, new knowledge15Digitizing Biological Collections

Institute for Digital Information & Scientific Communication Florida State University16Herbarium Digitization WorkshopHERBARIUM OF WEST GEORGIA COLLEGEAerocladium trifarium (Web.& Mohr) R.& W.Locality: SCOTLAND. Crianlarich,Mid Perth v.c. 88 flush in Cave Ardrain.Habitat:Date: July 3>19&3Collector: E .G .WallaceNo.:-Altitude:VSC-L00008Herbarium of Vatdosta Stat# CoHwg* BRITISH COLUMBIAFLORA OF CANADA Abietinella abietina (Hedw.) Fleisch.On soil in woods, near Golden.J. A. MacFadden30 July 1928VSC-L00001

Note barcode valueuseful:=fitness, robustness, usability, discoverability, elucidation, new knowledge16Apiary Project www.apiaryproject.org - Funded by IMLS National Leadership Grant # 06-08-0079-08Botanical Research Institute of Texas / UNT TxCDK

The Apiary Project:A collaborative workflow for extraction of herbarium label dataA project of BRIT and UNTs Texas Center for Digital Knowledge

17Apiary Project www.apiaryproject.org - Funded by IMLS National Leadership Grant # 06-08-0079-08Botanical Research Institute of Texas / UNT TxCDK

18Transformation from analog objects to structured data.The Technology and Workflow

Apiary Project www.apiaryproject.org - Funded by IMLS National Leadership Grant # 06-08-0079-08Botanical Research Institute of Texas / UNT TxCDK

19Apiary Project www.apiaryproject.org - Funded by IMLS National Leadership Grant # 06-08-0079-08Botanical Research Institute of Texas / UNT TxCDKDigitize

20Finding Regions of InterestApiary Project www.apiaryproject.org - Funded by IMLS National Leadership Grant # 06-08-0079-08Botanical Research Institute of Texas / UNT TxCDK

21A person inspects the image and delineates each area that has textual content. Additionally, the regions can be classified as primary label, annotation, barcode, etc.Transcription or OCRApiary Project www.apiaryproject.org - Funded by IMLS National Leadership Grant # 06-08-0079-08Botanical Research Institute of Texas / UNT TxCDK

22Each region is displayed for human transcriptionDigitizing Biological Collections

Institute for Digital Information & Scientific Communication Florida State University23Herbarium Digitization Workshophttp://vimeo.com/42586885Uploading a CSV in Salix:

Cleaned textSalix software download: http://daryllafferty.com/salix/

Salix documentation: http://nhc.asu.edu/vpherbarium/canotia/SALIX3.pdf

These links are on the Wiki under Database Resources and Toolsuseful:=fitness, robustness, usability, discoverability, elucidation, new knowledge23

Digitizing Biological Collections

Institute for Digital Information & Scientific Communication Florida State University24Herbarium Digitization WorkshopVoice/Speech Recognition

Dragon Naturally SpeakingNuance (now owns IBMs ViaVoice)Mac & PCWorks better with a single user(?)~$200.00 for premium version

Speech to textTrainingBRIT project (Windows API)Included with Windowsuseful:=fitness, robustness, usability, discoverability, elucidation, new knowledge24

Digitizing Biological Collections

Institute for Digital Information & Scientific Communication Florida State University25Herbarium Digitization WorkshopCapturing Bar Code Values

Barcode scanningLinear2DAvoid data other than catalog number

Sync barcode value with camera-named filesuseful:=fitness, robustness, usability, discoverability, elucidation, new knowledge25Digitizing Biological Collections

Institute for Digital Information & Scientific Communication Florida State University26Herbarium Digitization WorkshopCapturing Bar Code Values

File re-naming at captureBardecodefilerBCRename

Renaming files to the barcode valueFNInterceptSilveImageBarcode values can be capture at more than one place in the workflow.Pre-digitization curationData captureImage capture

useful:=fitness, robustness, usability, discoverability, elucidation, new knowledge26Digitizing Biological