60
The PLOS Thesaurus: the first year Rachel Drysdale Taxonomy Manager, PLOS DHUG 2014 11 th February, 2014

Case Study: Public Library of Science Thesaurus: Year One

Embed Size (px)

DESCRIPTION

Presented at the 10th annual Data Harmony Users Group meeting on Tuesday, February 11, 2014 by Rachel Drysdale of PLOS. Discusses the process of building and integrating their new thesaurus into the PLOS journals workflow and publication platform. From constructing the thesaurus to creating channels for feedback and updates, through building new current awareness and discovery tools, to gathering data for article level metrics and web site analytics, follow their progress through to today’s PLOS websites and services.

Citation preview

Page 1: Case Study: Public Library of Science Thesaurus: Year One

The PLOS Thesaurus: the first year

Rachel Drysdale – Taxonomy Manager, PLOS

DHUG 2014

11th February, 2014

Page 2: Case Study: Public Library of Science Thesaurus: Year One

Public Library of Science - evolution

2000 PLOS founded

2003 PLOS Biology

2004 PLOS Medicine

2005 PLOS Computational Biology (June)

PLOS Genetics (July)

PLOS Pathogens (September)

2006 PLOS ONE

2007 PLOS Neglected Tropical

Diseases

2

Page 3: Case Study: Public Library of Science Thesaurus: Year One

Journal Article Count

PLOS Biology 3,450

PLOS Medicine 2,626

PLOS Computational Biology 3,112

PLOS Genetics 4,048

PLOS Pathogens 3,639

PLOS ONE 87,296

PLOS Neglect Trop Diseases 2,444

Page 4: Case Study: Public Library of Science Thesaurus: Year One

Journal Article Count

PLOS Biology 3,450

PLOS Medicine 2,626

PLOS Computational Biology 3,112

PLOS Genetics 4,048

PLOS Pathogens 3,639

PLOS ONE 87,296

PLOS Neglect Trop Diseases 2,444

beautiful monster….

Page 5: Case Study: Public Library of Science Thesaurus: Year One

Overview – today’s talk

The Solution: Good Thesaurus + Machine Aided Indexing

Building the new Thesaurus with AI

The initial implementation at plos.org

MAIstro integration into Publishing workflow

Thesaurus maintenance

The Service:

Content Discovery

Article Analysis Relative Metrics

5

Page 6: Case Study: Public Library of Science Thesaurus: Year One

Starting point

2011 – the old Taxonomy

Inadequate

in content – just over 3100 specific terms

Inflexible

in structure – terms in pre-defined paths

Housed in Editorial Manager

ossified and difficult to update

Author-chosen terms - association with article

6

Page 7: Case Study: Public Library of Science Thesaurus: Year One

PLOS delivered to Access Innovations….

A copy of the old PLOS Taxonomy

Over 2,000 suggested changes

“Research analysis and methods” branch request

Use cases:

Subject Area-based searches

Hierarchy-based exploration of our corpus

Email Alerts based on Subject Area searches

RSS Feeds based on Subject Areas

7

Page 8: Case Study: Public Library of Science Thesaurus: Year One

Access Innovations added:

STEM vocabulary

Broader/Narrower term relationships

Rules for the Machine Aided Indexing

Synonyms

Analysis with respect to the PLOS corpus

.....to and fro with PLOS ….

Result:

Vastly improved NISO Z-39.19-compliant thesaurus

8

Page 9: Case Study: Public Library of Science Thesaurus: Year One

Statistics

9

Old Taxonomy A. I. Thesaurus

Terms 3,132 10,156

Synonyms 0 3,291

Tiers 5 7

Rules 0 14,798

Page 10: Case Study: Public Library of Science Thesaurus: Year One

Top-level Terms

1. Biology and life sciences

2. Computer and information sciences

3. Earth sciences

4. Engineering and technology

5. Environmental sciences and ecology

6. Medicine and health sciences

7. Physical sciences

8. Research and analysis methods

9. Science policy

10. Social sciences

10

Page 11: Case Study: Public Library of Science Thesaurus: Year One

Infrastructure

PLOS Taxonomy server:

Thesaurus – plos2012thes

Data Harmony Thesaurus Master and

MAI Rule Builder

Corpus fed to the Taxonomy Server for MAI

Article by article

Initial implementation:

Title – Abstract - Results – Methods

Top 8 hits selected

11

Page 12: Case Study: Public Library of Science Thesaurus: Year One

Elapsed time from

project kick-off

until terms appeared

on published articles:

9 months

Page 13: Case Study: Public Library of Science Thesaurus: Year One

13

Learning curve – teething troubles

Not all articles had Subject Area terms – why not?

Initial implementation – text to index:

Title + Abstract* + Results + Methods

Upon consideration – text to index:

Full Text (though not references)

Implementation of “all paths”

Polyhierarchy implications

Page 14: Case Study: Public Library of Science Thesaurus: Year One

Consider “White blood cells”

Biology and life sciences Medicine and health sciences

Immunology Immunology

Immune cells Immune cells

White blood cells White blood cells

Biology and life sciences Biology and life sciences

Cell biology Cell biology

Cellular types Cellular types

Animal cells Animal cells

Blood cells Immune cells

White blood cells White blood cells

14

The polyhierarchy and Search

Page 15: Case Study: Public Library of Science Thesaurus: Year One

15

Establishing update cycle - articles:

Initial implementation:

Entire back-corpus indexed at once

New Papers:

PLOS submits text to MAIstro at publication

MAI returns terms and term frequencies

PLOS stores terms in search engine

Page 16: Case Study: Public Library of Science Thesaurus: Year One

16

Establishing update cycle - thesauri:

Separate instances (nerves):

Production server – plosthes.2013-6

Working version – plosthes.2013-7

When ready to release a new version:

Load onto test server – MAI corpus - Index

Test: new/changed/deleted terms

rule changes

structural changes

any implementation changes

Page 17: Case Study: Public Library of Science Thesaurus: Year One

17

Thesaurus updates – why?

More terms : Memory T cells, Monocotyledons

Errrm… : Report gene detection

What? : Webs

Hierarchy changes deemed desirable:

Geographical locations

Organisms

(Un)Rule(y) : snails, fabrication, pumas

Page 18: Case Study: Public Library of Science Thesaurus: Year One

Thesaurus updates – how?

18

Page 19: Case Study: Public Library of Science Thesaurus: Year One

Thesaurus updates – how?

19

Page 20: Case Study: Public Library of Science Thesaurus: Year One

Thesaurus updates – how?

20

Page 21: Case Study: Public Library of Science Thesaurus: Year One

Thesaurus updates – how?

21

Page 22: Case Study: Public Library of Science Thesaurus: Year One

22

Rule-Building in MAIstro – Pumas before...

Page 23: Case Study: Public Library of Science Thesaurus: Year One

23

Rule-Building in MAIstro – Pumas before...

p53 upregulated modifier of apoptosis

or

Page 24: Case Study: Public Library of Science Thesaurus: Year One

Rule-Building in MAIstro – Pumas after…

24

Page 25: Case Study: Public Library of Science Thesaurus: Year One

25

Page 26: Case Study: Public Library of Science Thesaurus: Year One

26

Thesaurus updates – prioritisation?

Miss-hits and missed term reports:

Ourselves:

article pages

Our readers:

in email

complaints in twitter

in correspondence with our editorial staff

via Journal and Saved Search alerts

via article pages – Flagged Term reports

Page 27: Case Study: Public Library of Science Thesaurus: Year One

27

Page 28: Case Study: Public Library of Science Thesaurus: Year One

28

Things we learned – Thesaurus editorial

Tension:

strict and rigorous taxonomy/ontology construction

vs

user utility

Abbreviations and Synonyms

Issues that continue to exercise us:

T cells/Memory T cells

Obesity/Childhood obesity

When should we make both explicit?

Rule work – working to top 8

Page 29: Case Study: Public Library of Science Thesaurus: Year One

29

Building a new project - exports

Page 30: Case Study: Public Library of Science Thesaurus: Year One

30

Building a new project - import

Page 31: Case Study: Public Library of Science Thesaurus: Year One

Content Discovery

How has having the thesaurus changed the way that

users interact with PLOS web sites?

Page 32: Case Study: Public Library of Science Thesaurus: Year One

32

• Journal alerts

• Saved Searches

• RSS feeds

• Hierarchy exploration

Problem:

How to keep up?

Solution:

Current Awareness Tools

Page 33: Case Study: Public Library of Science Thesaurus: Year One

33

Page 34: Case Study: Public Library of Science Thesaurus: Year One

34

Journal alerts

Page 35: Case Study: Public Library of Science Thesaurus: Year One

35

Journal alerts

Page 36: Case Study: Public Library of Science Thesaurus: Year One

36

Journal alerts

Page 37: Case Study: Public Library of Science Thesaurus: Year One

37

Journal alerts

Page 38: Case Study: Public Library of Science Thesaurus: Year One

38

Journal alerts

Page 39: Case Study: Public Library of Science Thesaurus: Year One

39

Saved search

Page 40: Case Study: Public Library of Science Thesaurus: Year One

40

Saved search

Page 41: Case Study: Public Library of Science Thesaurus: Year One

41

RSS feeds

Page 42: Case Study: Public Library of Science Thesaurus: Year One

42

RSS feeds

Page 43: Case Study: Public Library of Science Thesaurus: Year One

43

Hierarchy exploration

Page 44: Case Study: Public Library of Science Thesaurus: Year One

44

Hierarchy exploration

Page 45: Case Study: Public Library of Science Thesaurus: Year One

45

Hierarchy exploration

Page 46: Case Study: Public Library of Science Thesaurus: Year One

46

Hierarchy exploration

Page 47: Case Study: Public Library of Science Thesaurus: Year One

47

Hierarchy exploration

Page 48: Case Study: Public Library of Science Thesaurus: Year One

48

Hierarchy exploration

Page 49: Case Study: Public Library of Science Thesaurus: Year One

Relative Metrics

Page 50: Case Study: Public Library of Science Thesaurus: Year One

Relative Metrics:

Defining a Paper’s Peer Group

1. Group papers by Subject Area

Accommodate multiple topics per paper

2. Group papers by age

Important for comparison of cumulative measures like total downloads or citations

3. Determine norms for peer group

The average usage of each paper is compared with the median usage of its peer group

More on Relative Metrics at:

http://www.plosone.org/static/almInfo#relativeMetrics

50

Page 51: Case Study: Public Library of Science Thesaurus: Year One

51

Relative Metrics

Page 52: Case Study: Public Library of Science Thesaurus: Year One

52

Relative Metrics

Page 53: Case Study: Public Library of Science Thesaurus: Year One

53

Page 54: Case Study: Public Library of Science Thesaurus: Year One

54

Page 55: Case Study: Public Library of Science Thesaurus: Year One

Area of development - Editorial Workflow

Page 56: Case Study: Public Library of Science Thesaurus: Year One

The PLOS Thesaurus and Peer Review

Maintaining a copy of the PLOS thesaurus in Editorial

Manager helps with editor and reviewer matching

56

Classifications for

People

Classifications for

Papers

Page 57: Case Study: Public Library of Science Thesaurus: Year One

The PLOS Thesaurus and Peer Review

• Authors select Subject Area terms related to their article

submissions

• Editors and Reviewers select terms that represent their

areas of expertise

• Staff and Editors use these terms to help ensure editors

and reviewers are well matched to the submissions they

are handling

57

Page 58: Case Study: Public Library of Science Thesaurus: Year One

Planned Enhancements

• Automate the application of terms associated with

Editors, Reviewers and submitted articles with MAIstro

• Provide Editors and Staff with detailed terms to assist

with reviewer selection and vetting

– Academic disciplines help Editors gauge Subject Area

relevance of potential Reviewers

– Methods, protocols and model organisms help Editors

gauge technical suitability of potential Reviewers

58

Page 59: Case Study: Public Library of Science Thesaurus: Year One

59

Jonas Dupuich Product Manager

Patrick Polischuk Product Manager

Sebastian Toomey Interaction Designer

Jennifer Lin Senior Product Manager

Martin Fenner ALM Technical Lead

Kallie Huss Senior Publications Assistant

John Chodacki Director - Product Management

Dramatis personae:

Page 60: Case Study: Public Library of Science Thesaurus: Year One

60