5
Image access via Flickr The Biodiversity Heritage Library (BHL) is .... Art of Life project More than just a pretty picture: improving the discoverability of illustrations in the Biodiversity Heritage Library by Gilbert Borrego, Grace Costantino, Bianca Crowley, Kyle Jaebker, Trish Rose-Sandler Hidden within BHL literature are millions of rich illustrations An open access digital library for historic biodiversity literature An open data repository of taxonomic names and bibliographic information BHL staff manually identify and push BHL images to a Flickr stream (www.flickr.com/photos/biodivlibrary) but the process does not scale to the millions of images available The Art of Life project , enabled by a grant from NEH, aims to automate the process of identifying and tagging images via algorithms Users can add tags to images in Flickr so that they are searchable. They are also encouraged to add species names via machine tags so BHL can automatically share these images with the Encyclopedia of Life (http://eol.org/collections/53002) The project defined a metadata schema for natural history illustrations that will help crowdsource more detailed descriptions via image portals such as Wikimedia Commons (http://tinyurl.com/9hm7nsb) www.biodiversitylibrary.org

More than just a pretty picture: improving the discoverability of illustrations in the Biodiversity Heritage Library

Embed Size (px)

DESCRIPTION

This was a demo given by Trish Rose-Sandler and Kyle Jaebker at the Museums and the Web Conference on April 20th 2013 related to how BHL is improving access to its natural history illustrations via Flickr and via the Art of Life project. Authors for the poster and handouts include: Gilbert Borrego, Grace Costantino, Bianca Crowley, Kyle Jaebker, and Trish Rose-Sandler

Citation preview

Page 1: More than just a pretty picture:  improving the discoverability of illustrations in the Biodiversity Heritage Library

Image access via Flickr The Biodiversity Heritage

Library (BHL) is.... Art of Life project

More than just a pretty picture: improving the discoverability of

illustrations in the Biodiversity Heritage Library by Gilbert Borrego, Grace Costantino, Bianca Crowley, Kyle Jaebker, Trish Rose-Sandler

Hidden within BHL literature are

millions of rich illustrations

• An open access digital library

for historic biodiversity literature

• An open data repository of

taxonomic names and

bibliographic information

BHL staff manually identify and push BHL images to a

Flickr stream (www.flickr.com/photos/biodivlibrary) but

the process does not scale to the millions of images

available

The Art of Life project , enabled by a grant from NEH,

aims to automate the process of identifying and

tagging images via algorithms

Users can add tags to images

in Flickr so that they are

searchable. They are also

encouraged to add species

names via machine tags so

BHL can automatically share

these images with the

Encyclopedia of Life (http://eol.org/collections/53002)

The project defined a metadata schema for natural history

illustrations that will help crowdsource more detailed

descriptions via image portals such as Wikimedia Commons (http://tinyurl.com/9hm7nsb)

www.biodiversitylibrary.org

Page 2: More than just a pretty picture:  improving the discoverability of illustrations in the Biodiversity Heritage Library

Uploading Images to FlickrThe Biodiversity Heritage Library (BHL, www.biodiversitylibrary.org) provides access to thousands of scientific illustrations through the social media site, Flickr. To expedite the process of uploading these images to Flickr, a workflow was developed within BHL’s backend database. When paginating, or enhancing a book’s page metada-ta, staff can click a single button to upload all illustrations within that book to Flickr. Bibliographic information and a link to the image in BHL are also embedded during the process.

This workflow was internally documented in the form of a tutorial to ensure that all BHL partners can contribute to this effort and be part of the program’s expanding outreach efforts.

The use of Flickr as an outreach platform exposes our rich image collection to search engines and new users. Additionally, it allows us to provide images of species to include on the Encyclopedia of Life’s taxon pages. While the original intention of BHL’ Flickr account was to provide easy access to scientific figures, plates and illustra-tions, the site has taken on a life of its own and is being repurposed by users all around the world in the most imaginative ways.

From BHL’s backend dashboard, staff select the pages to upload to Flickr.

Final view in Flickr.

Once images are uploaded, staff can create sets, add additional bibliographic information, and assign

sets to collections.

Visit the BHL Flickr today! http://www.flickr.com/photos/biodivlibraryLearn how you can help add species names to BHL Images:http://www.flickr.com/groups/encyclopedia_of_life/discuss/72157629515768640/

Page 3: More than just a pretty picture:  improving the discoverability of illustrations in the Biodiversity Heritage Library

The Flickr Tagging ProcessCrowdsourcing Species Identification and Image Tagging

The Biodiversity Heritage Library (BHL, www.biodiversitylibrary.org), an open access digital library consortium for biodiversity literature, utilizes Flickr to provide access to thousands of images extracted from its digital collections. In order to improve discoverability and usability of these images, BHL crowdsources the task of adding species name machine tags to images in Flickr.

Tags are searchable keywords that users can apply to images in Flickr. Machine tags are specially formatted to be read by computers: taxonomy:binomial=“Genus species”

BHL encourages its users to identify the species depicted in an image using the book’s image descriptions and add that species name to the image as a machine tag. By adding these tags to BHL images, users can search within Flickr for images of specific species and BHL can automatically share these images with the Encyclopedia of Life (EOL, www.eol.org).

EOL is an open access project dedicated to providing a webpage for every species. EOL harvests machine-tagged images from the BHL Flickr, uploads them to a BHL Image Collection in EOL, and automatically associates the images with the matching species page. To date, thousands of machine-tagged images have been added to EOL.

Visit the BHL Flickr today! http://www.flickr.com/photos/biodivlibraryLearn how you can help add species names to BHL Images:http://www.flickr.com/groups/encyclopedia_of_life/discuss/72157629515768640/

Find an image in Flickr

Add a species name machine tag

The image is automatically ingested into the BHL Image Collection in EOL

And automatically associated with the corresponding species

page in EOL

Page 4: More than just a pretty picture:  improving the discoverability of illustrations in the Biodiversity Heritage Library

Users clamor for the Art of LifeThe Art of Life project evolved out of a need to improve access to the rich corpus of natural history illustrations hidden within the digitized pages of books and journals in the Biodiversity Heritage Library (BHL, www.biodiversitylibrary.org). Currently, these illustrations have no descriptive metadata such as title, creator or subject matter that can be searched. The only way to uncover these gems is by opening up a BHL book or vol-ume and scrolling through page by page.

One solution has been for BHL staff to manually identify pages that contain illustrations and to push those pages into a BHL Flickr stream which allows for discovery through themed collections and in some cases species names. While this approach has resulted in improved access to some of BHL’s illustrations, it requires significant staff time and the process does not scale well to the millions of images that are present within the BHL pages.

Example of an illustration described using Art of Life schemaIllustration schema elements.

Visit the BHL Flickr today! http://www.flickr.com/photos/biodivlibrary

Read more about the Art of Life project:http://biodivlib.wikispaces.com/Art+of+Life

Elements chosen were a mix of VRA

Core 4.0 and Darwin Core

Workflow diagram that outlines how each illustration will move through the Art of Life processes.

Thus, the Art of Life project was designed as a solution for automating the process of image identification and crowdsourcing their descriptions. The project is a partnership between the Missouri Botanical Garden and the Indianapolis Museum of Art and supported by the National Endowment for the Humanities. It runs from May 2012-April 2014. The Art of Life has five primary objectives: 1) define a metadata schema appropriate for nat-ural history illustrations, 2) build algorithms to automatically identify BHL pages with illustrations, 3) sort and classify the illustrations, 4) crowdsource descriptions through tagging applications; and 5) integrate descriptive metadata back into BHL and share images and descriptions with audiences outside of BHL. These illustrations will be of interest to a diversity of audiences including: artists; biologists; humanities scholars; librarians; educa-tors; citizen scientists.

Page 5: More than just a pretty picture:  improving the discoverability of illustrations in the Biodiversity Heritage Library

Automating the Heavy LiftingUsing Algorithms to Identify Images in BHL

In the Art of Life project, the Indianapolis Museum of Art (IMA) and the Biodiversity Heritage Library (BHL, www.biodiversitylibrary.org) have been working to develop algorithms to identify images from the pages of books and journals digitized from the BHL. Multiple algorithms are being developed including ABBYY OCR, contrast, color, and compression. These algorithms are being tested to determine the most efficient and accurate means of identifying images.

The IMA developed a set of software tools for running and analyzing the results of the algorithms. This software allows for the import of publications and journals determined to be good test samples for the algorithms. These samples termed the “Gold Standard” are being used to evaluate the algorithms for how useful they will be in determining if a scan contains a sketch or drawing. Using a custom built interface for reviewing the results, accurate processing results can be seen as well as false positives. In addition to the visual review of results, analysis across the entire “Gold Stan-dard” is ongoing to determine the best combination of algorithms.

Once completed, the algorithms will be deployed on a cluster to process the entire BHL collection. After the processing has been completed the metadata will be used to add additional descriptive and finding aides. This will allow users to discover and process illustrations from the books and journals that used to be very hard to discover.

Visit the BHL Flickr today! http://www.flickr.com/photos/biodivlibrary

Read more about the Art of Life project:http://biodivlib.wikispaces.com/Art+of+Life

Learn how you can help add species names to BHL Images:http://www.flickr.com/groups/encyclopedia_of_life/discuss/72157629515768640/

Algorithm Results Viewer

Compression Ratio Algorithm Analysis

Close-up Algorithm Result