25
Crowd-sourcing the creation of “articles” within the Biodiversity Heritage Library Bianca Crowley [email protected] Trish Rose-Sandler [email protected]

Crowd-sourcing the creation of "articles" within the Biodiversity Heritage Library

Embed Size (px)

DESCRIPTION

An analysis of crowd-sourced "article" creation and user-generated metadata for a digital repository of biodiversity literature

Citation preview

Page 1: Crowd-sourcing the creation of "articles" within the Biodiversity Heritage Library

Crowd-sourcing the creation of “articles” within the Biodiversity

Heritage Library

Bianca [email protected]

Trish [email protected]

Page 2: Crowd-sourcing the creation of "articles" within the Biodiversity Heritage Library

The BHL is…

• A consortium of 13 natural history, botanical libraries and research institutions

• An open access digital library for legacy biodiversity literature.

• An open data repository of taxonomic names and bibliographic information

• An increasingly global effort

BHLLITA 2011

Page 3: Crowd-sourcing the creation of "articles" within the Biodiversity Heritage Library

Problem: Books vs. ArticlesLibrarians manage books Users need articles

BHLLITA 2011

Page 4: Crowd-sourcing the creation of "articles" within the Biodiversity Heritage Library

Solution: “Article-ization”

Creating articles manually, through the help of our users: BHL PDF Generator

Creating articles through automated means: BioStor http://biostor.org/issn/0006-324X

BHLLITA 2011

Page, R. (2011). Extracting scientific articles from a large digital archive: BioStor and the Biodiversity Heritage Library. BMC Bioinformatics, 12(187). Retrieved from

http://www.biomedcentral.com/1471-2105/12/187

Page 5: Crowd-sourcing the creation of "articles" within the Biodiversity Heritage Library

LITA 2011 BHL

Page 6: Crowd-sourcing the creation of "articles" within the Biodiversity Heritage Library

Create-your-own PDF

BHLLITA 2011

Page 7: Crowd-sourcing the creation of "articles" within the Biodiversity Heritage Library

Citebank today: http://citebank.org

BHLLITA 2011

Page 8: Crowd-sourcing the creation of "articles" within the Biodiversity Heritage Library

What is an “article” anyway?

BHLLITA 2011

Page 9: Crowd-sourcing the creation of "articles" within the Biodiversity Heritage Library

the Good, the Bad, the Ugly

BHLLITA 2011

Page 10: Crowd-sourcing the creation of "articles" within the Biodiversity Heritage Library

the Good, the Bad, the Ugly

BHLLITA 2011

Page 11: Crowd-sourcing the creation of "articles" within the Biodiversity Heritage Library

the Good, the Bad, the Ugly

BHLLITA 2011

Page 12: Crowd-sourcing the creation of "articles" within the Biodiversity Heritage Library

Questions for Data Analysis

• What is the quality, or accuracy, of user provided metadata?

• What kinds of content are users creating?

• How can we improve the PDF generator interface?

BHLLITA 2011

Page 13: Crowd-sourcing the creation of "articles" within the Biodiversity Heritage Library

Stats

• Jan 2010-Apr 2011 – Approx 60,000 pdfs created from PDF

Generator– 40% of those (approx 24,000) were ingested

into CiteBank (PDFs without user-contributed metadata excluded)

• 5 reviewers analyzed 945 pdfs (approx 3.9% of the 24,000+ articles going into Citebank)

**Thanks to reviewers Gilbert Borrego, Grace Costantino, and Sue Graves from the Smithsonian Institution

BHLLITA 2011

Page 14: Crowd-sourcing the creation of "articles" within the Biodiversity Heritage Library

Methodological approach

• Quantitative – numerical rating system

• Rated titles, authors, beg/end pages• Its “findability” within CiteBank

search often determined how it was rated

BHLLITA 2011

Page 15: Crowd-sourcing the creation of "articles" within the Biodiversity Heritage Library

Ratings System

Title

• 1=has all characters in title letter for letter• 2=does not have all characters in title letter for

letter but still findable in CiteBank search • 3= does not have all characters in title letter for

letter and is NOT findable via the CiteBank search

LITA 2011 BHL

Page 16: Crowd-sourcing the creation of "articles" within the Biodiversity Heritage Library

Ratings System

Author

• 1=has all characters in author(s) last name letter for letter

• 2=has at least one author’s last name spelled correctly

• 3=has no authors or none of the author’s last names are spelled correctly

LITA 2011 BHL

Page 17: Crowd-sourcing the creation of "articles" within the Biodiversity Heritage Library

Ratings System

Article beginning & ending pages

• 1=has all text pages for an article, from start to end

• 2=subset of pages from a larger article • 3=a set of pages where the intellectual content

has been compromised.

LITA 2011 BHL

Page 18: Crowd-sourcing the creation of "articles" within the Biodiversity Heritage Library

Analysis steps

LITA 2011

Page 19: Crowd-sourcing the creation of "articles" within the Biodiversity Heritage Library

ResultsTitle average

1.68

Title average 1.68

Author(s) average 1.33

Beg/End pg average 1.41

Title & Author average 1.50

Overall average (combines first 3 above)

1.47

LITA 2011 BHL

Page 20: Crowd-sourcing the creation of "articles" within the Biodiversity Heritage Library

What did we learn?

• Ratings were better than we expected

• Many users took the time to create decent metadata

• “good enough” is not great but is still “findable”

LITA 2011 BHL

Page 21: Crowd-sourcing the creation of "articles" within the Biodiversity Heritage Library

BHL-Australia’s new portalhttp://bhl.ala.org.au/

there’s always room for improvement

Other factors

But of course…..

BHLLITA 2011

Page 22: Crowd-sourcing the creation of "articles" within the Biodiversity Heritage Library

Changes we madefor UI so far

• Asking users if they want to contribute their article to CiteBank

• Making article title a required field and validating it so its at least 2 or more characters

•  Review button for users to review page selections and metadata (inspired by BHL-AUS)

• Reduced text and increased more intuitive graphics (inspired by BHL-AUS)

BHLLITA 2011

Page 23: Crowd-sourcing the creation of "articles" within the Biodiversity Heritage Library

Brief survey of proposed changes

• Overwhelmingly positive response to proposed change

there’s always room for improvement

But of course…..

BHLLITA 2011

Page 24: Crowd-sourcing the creation of "articles" within the Biodiversity Heritage Library

Success Factors

• Monitor the creation of the metadata to look at user behavior and patterns

• Engage with your users

• Incentivize your users

LITA 2011

Page 25: Crowd-sourcing the creation of "articles" within the Biodiversity Heritage Library

@BioDivLibrary

/pages/Biodiversity-Heritage-Library/63547246565

/photos/biodivlibrary/sets/

/group/biodiversity-heritage-library

Bianca [email protected]

Trish [email protected]

http://biodiversitylibrary.org

BHLLITA 2011