43
The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego [email protected] http://www.sdsc.edu/pb

The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego [email protected]

Embed Size (px)

Citation preview

Page 1: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu

The Role of Ontologies in Improved Scholarly

Communication

Philip E. BourneUniversity of California San Diego

[email protected]://www.sdsc.edu/pb

Page 2: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu

My Perspective …• Ontology Developer (years ago – mmCIF -

Bioinformatics 2002 18: 1280-128)• Database Developer – RCSB PDB• Supporter of open access (provided there is a

business model) - editor in chief of PLoS Computational Biology

• Co-founder - SciVee Inc. • I am becoming increasingly interested in scholarly

communication• I use ontologies to support this work

Page 3: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu

Objective Today

• Describe how we are using ontologies to try and improve scholarly communication

• Motivate you towards thinking about ontologies that should be developed

• Learn from you where we might spend our efforts

Page 4: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu

First Consider What Motivates Us to Improve Scholarly

Communication

Page 5: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu

We Cannot Possibly Read a Fraction of the Papers We Should

Drivers of Change Renear & Palmer 2009 Science 325:828-832

Page 6: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu

Hence We Are Scanning More Reading Less

Renear & Palmer 2009 Science 325:828-832Drivers of Change

Page 7: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu

The Truth About the Scientific eLaboratory

• I have ?? mail folders!

• The intellectual memory of my laboratory is in those folders

• This is an unhealthy hub and spoke mentality

Drivers of Change

Page 8: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu

The Truth About the Scientific eLaboratory

• I generate way more negative that positive data, but where is it?

• Content management is a mess– Slides, posters…..– Data, lab notebooks ….– Collaborations, Journal clubs …

• Software is open but where is it?• Farewell is for the data too

Drivers of Change

Computational Biology Resources Lack Persistence and Usability. PLoS Comp. Biol. 4(7): e1000136

Page 9: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu

Data and the Publication Are Disjoint

• PubMed contains 18,792,257 entries

• ~100,000 papers indexed per month

• In Feb 2009:– 67,406,898 interactive

searches were done– 92,216,786 entries were

viewed

• 1078 databases reported in NAR 2008

• MetaBase http://biodatabase.org reports 2,651 entries edited 12,587 times

Biosciences Data as of April 14, 2009Drivers of Change

Page 10: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu

Publishing Limitations

• A paper is an artifact of a previous era• It is not the logical end product of eScience,

hence:– Work is omitted– Article vs supplement is a mess– Visualization may be limited– Interaction and enquiry are non-existent– Rich media can help, but are rarely used

Drivers of Change

Page 11: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu

We Need to do Better & The Game is Afoot

It is being driven from the top down and the bottom up

Page 12: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu

Ontologies & Semantic Tagging

Page 13: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu

BioLit Data Extraction/StorageDatabase IDsOntology termsText excerptsOther… BioLit

MySQLdatabase

XML

XML,Meta-data

<w

eb

se

rvic

es>

we

b

ext

ern

ald

ata

bas

es

Semantic Tagging

Page 14: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu

Tagging of PubMed Central

• Ontologies read from OBO Files• Words converted to tree structures• Matched to every non-trivial word in the

paper• Matches tagged• A long paper can be matched to GO in less

than 30 seconds

Semantic Tagging http://biolit.ucsd.edu

Page 15: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu

Semantic Tagging http://biolit.ucsd.edu

Page 16: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu

ICTP Trieste, December 10, 200716

http://biolit.ucsd.eduSemantic Tagging

Page 17: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu

Provision of Webservices to this tagging may be the most valuable contribution..

Semantic Tagging

Page 18: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu

www.rcsb.org/pdb/explore/literature.do?structureId=1TIMDatabase & Literature Integration

Context

BMC Bioinformatics 2010 11:220Semantic Tagging

Page 19: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu

Semantic Tagging of Database Content

http://www.pdb.orgPLoS Comp. Biol. 6(2) e1000673Semantic Tagging

Page 20: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu

Automatic Knowledge Discovery for Those with No Time to Read

Immunology Literature

Cardiac DiseaseLiterature

Shared FunctionSemantic Tagging

Page 21: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu

This is Literature Post-processingBetter to Get the Authors Involved

• Authors are the absolute experts on the content

• More effective distribution of labor

• Add metadata before the article enters the publishing process

BMC Bioinformatics 2010 11:103Semantic Tagging

Page 22: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu

Word 2007 Add-in for Authors

• Allows authors to add metadata as they write, before they submit the manuscript

• Authors are assisted by automated term recognition– OBO ontologies– Database IDs

• Metadata are embedded directly into the manuscript document via XML tags, OOXML format– Open– Machine-readable

• Open source, Microsoft Public License

http://www.codeplex.com/ucsdbiolitDrivers of Change

Page 23: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu

Word 2007 Add-in Example of What it Looks Like - Ontologies

• Inline Recognition, Highlighting, and Mark-up of Informative Terms– A recognized term will have a dotted, purple underline– Hovering generates a Smart Tag above the term

• add mark-up for this term• ignore this term• view the term in the ontology browser• If a recognized term appears in more than one ontology, all instances

of that term will be listed– Hovering over a marked-up term

• option to apply mark-up to all recognized instances of term• stop recognizing a term

– Pass ontology terms back to provider

Semantic Tagging BMC Bioinformatics 2010 11:103

Page 24: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu

• Built-in Knowledge of Ontologies and Databases– Add-in provides a list of biomedical ontologies to

download– and a list of databases for ID recognition

(GenBank/RefSeq, UniProt, Protein Data Bank)– A user may also supply a URL to download other

ontologies

• Ontology Browser– allows a user to select an ontology and then navigate

through it to view terms and their relationships

BMC Bioinformatics 2010 11:103

Page 25: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu

Custom Metadata• Ontologies do not contain all usages of a concept• Add-in allows user to assign custom metadata

• Human Disease Ontology term: Leukemia, T-Cell, HTLV-II-Associated

• Synonym: Atypical hairy cell leukemia (disorder) • Actual use in literature:

– hairy cell leukemia– hairy-cell leukemia– hairy T cell leukemia– T cell hairy leukemia

BMC Bioinformatics 2010 11:103

Page 26: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu

Synonym mapping, disambiguation

• Inclusion of an additional set of synonyms for a term that reflect its use in natural language– Automated finding of synonyms in extant

literature– Gather synonyms from term-mapping databases

• Incorporate a more sophisticated term recognition approach into the add-in

BMC Bioinformatics 2010 11:103

Page 27: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu

Challenges

• Author use– Familiarity with ontologies, terms– Agreement between co-authors

• End-use of semantically enriched manuscript

• Need to combine with NLM XML standard

Semantic Tagging BMC Bioinformatics 2010 11:103

Page 28: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu

Challenges: Author Use

IF one or more publishers fast tracked a paper that had semantic

markup I would argue it would catch on in no time

Semantic Tagging BMC Bioinformatics 2010 11:103

Page 29: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu

Where we Need {Better} Ontologies

1. To Support Mashups Between Different Types of Scholarly Output

Page 30: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu

Post-publication of Video and Paperwww.scivee.tv

Drivers of Change

Page 31: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu

Pubcast – Video Integrated with the Full Text of the Paper

Page 32: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu

Pubcasts - A Unique Technology

Don’t understand what you are reading? Click and have the author pop-up and explain it!

See the scientists and the experiments behind the research papers and textbooks

Pubcasts - A Blend of Video, text, tables, figures, PowerPoints, comments, ratings…ALL SYNCHRONIZED FOR RAPID LEARNING

Mashups – www.scivee.tv

Page 33: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu

Where we Need {Better} Ontologies

2. To Support Tagging of all Aspects of the Scholarly Product

Page 34: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu

Consider Today’s Academic Workflow

Research[Grants]

JournalArticle

ConferencePaper

PosterSession

Feds

Societies

Publishers

Reviews

BlogsCommunity Service/Data

Curation

What Should be Done?

Page 35: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu

Consider Tomorrow’s Academic Workflow

Research[Grants]

JournalArticle

ConferencePaper

PosterSession

Feds

Societies

Publishers

Reviews

BlogsCommunity Service/Data

CurationIdeas, Data, Hypotheses

What Should be Done?

Page 36: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu

Maybe The Line is Somewhere Else?

Scientist

Idea

Experiment

Data

Conclusions

Publish

Laboratory

Publisher

Page 37: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu

Maybe The Line is Somewhere Else?

Scientist

Idea

Experiment

Data

Conclusions

PublishWhat Should We Do?

Laboratory

Publisher

Institution

Lab Notebook

Page 38: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu

Crowd Sourcing the Electronic Printing Press(aka Workshop: Beyond the PDF)

• Proposal to the US National Science Foundation:

• Aims:– Define user requirements– Establish a specification document– Open source the development effort– Have a commitment from a publisher to publish a

research object using the system– Act as an exemplar for what can be done

Page 39: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu

Question: What if Everyone Had An Electronic Printing Press?

• Peer review might change?• Bibliometrics might change?• Business models will likely change?• What happens to the database/literature divide?• Societies might do more self publishing?• We might have improved the dissemination of

science, but will we have improved the comprehension?

Page 40: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu

General References

• What Do I Want from the Publisher of the Future PLoS Comp Biol http://www.sdsc.edu/pb

• Fourth Paradigm: Data Intensive Scientific Discovery http://research.microsoft.com/enus/collaboration/fourthparadigm/

Page 41: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu

References to Exemplars

• Semantic Biochemical Journal - 2010: Using Utopia

• Article of the Future, Cell, 2009:• Prospect, Royal Society of Chemistry, 2009:• Adventures in Semantic Publishing, Oxford U, 2009:

• The Structured Digital Abstract, Seringhaus/Gerstein, 2008• CWA Nanopublications – 2010

Page 42: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu

Acknowledgements• BioLit Team

– Lynn Fink– Parker Williams– Marco Martinez– Rahul Chandran– Greg Quinn

• Microsoft Scholarly Communications– Pablo Fernicola– Lee Dirks– Savas Parastitidas– Alex Wade– Tony Hey

• wwPDB team

• SciVee Team– Apryl Bailey– Tim Beck– Leo Chalupa– Lynn Fink– Marc Friedman (CEO)– Ken Liu– Alex Ramos– Willy Suwanto

http://www.scivee.tv

http://biolit.ucsd.eduhttp//www.pdb.orghttp://www.codeplex.com/ucsdbiolit

Page 43: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu

Questions?

[email protected]