15

Literature Services Resource Description Framework

Embed Size (px)

Citation preview

Literature Services Resource Description Framework

Jee-Hyub Kim

Literature Services, EMBL-EBI

21 May 2015

1 / 15

Contents

1 Europe PMC and Linking Literature

2 Publishing Text-Mined Data on RDF

3 Text-Mining RDF Service

4 Discussion

2 / 15

Europe PMC

• Europe PMC is a literature database [1].

• Abstracts: 30 million PubMed, Agricola and patent records, updateddaily

• Full text articles: over 3 million full text articles, of which over 900,000are free to read and reuse, updated daily

• Powerful and easy search

• Search all article content through one simple search interface,supported by deep search options for advanced users.

3 / 15

Linking Literature

• Europe PMC provides various types of linking literature.

• External Links: to any (e.g., database, Wikipedia, press release, etc.)• Citations: to literature• BioEntities (produced by Europe PMC text-mining pipeline)

• Biological entities: to concept• Accession numbers: to data

• Example: http://europepmc.org/abstract/MED/21926972

4 / 15

Europe PMC Text-Mining Pipeline

• A pipeline of dictionary- and machine learning-based named entitytaggers [3].

• 6 semantic types

• Genes/proteins• Chemicals• Organisms• GO terms• Disease terms• EFO terms

• 20 accession numbers [2]:

• ENA, RefSNP, PDB, UniProt, OMIM, PFam, ArrayExpress, RefSeq,Data DOI, Ensembl, InterPro

• NCT, Bioproject, Biosample, Eudract, EMDB, PXD, GO, EGA,TreeFam

• Programmatic access available.

5 / 15

Publishing Text-Mined Data

• Beyond BioEntities Tab

• Goals

• More connectivity• More contexts for each linking• Links to share

• Challenge: dealing with nearly a billion annotations generatedautomatically in a large scale

• Using Web Annotation Data Model.

6 / 15

Web Annotation Data Model

• Built on the top on RDF

• Annotations as resources

• To provide a standard description mechanism for sharing annotationsbetween systems

• For more general purpose use

• Not only for text mining• For example, YouTube video comments (by people), image annotation,etc.

• W3C Working Draft:http://www.w3.org/TR/2014/WD-annotation-model-20141211/

7 / 15

Core Annotation Framework

• Typically an Annotation has a single Body, which is the comment orother descriptive resource, and a single Target that the Body issomehow "about".

• The Body provides the information which is annotating the Target.

• This "aboutness" may be further clari�ed or extended to notions suchas classifying or identifying.

8 / 15

One Scenario: Text Comment On Web Page

• A textual comment on a selection of text within a web page

• How to select a text fragment?

• Text Position Selector: oa:start, oa:end• Text Quote Selector: oa:exact, oa:pre�x, oa:post�x

9 / 15

Text Quote Selector

10 / 15

A Model for Annotation

11 / 15

Service Description

• Running on EBI RDF Platform

• Stores 1,563,241,810 triples text-mined from 400,746 Open Accessarticles in Europe PubMed Central.

• Provides

• for each article, all the annotations linking to ontologies/databases• with contexts:

• sentences• section information

12 / 15

Use Case for Database Curation

• Given an database identi�er, provides sentence-level information fordatabase curation.

1 Show all the articles where a PDB accession number 3NSS ismentioned.

2 Show all the annotations with each its label in PMC3382907.

3 Show all the articles where in�ammatory bowel disease (C0021390) ismentioned.

• http://wwwdev.ebi.ac.uk/rdf/services/textmining/sparql

13 / 15

Discussion

• Can we deal with a large number of triples from 3 million full textarticles?

• A better URI scheme: e.g.,http://europepmc.org/articles/PMC4298172/methods/genes/TEM-1/23

• Interoperability with other formats used in text-mining community

• e.g., BioC, UIMA

• Questions?

14 / 15

References

The Europe PMC Consortium.Europe pmc: a full-text literature database for the life sciences andplatform for innovation.Nucleic Acids Research, 2014.

Senay Kafkas, Jee-Hyub Kim, and Johanna R. McEntyre.Database citation in full text biomedical articles.PLoS ONE, 8(5):e63184, 05 2013.

Dietrich Rebholz-Schuhmann, Miguel Arregui, Sylvain Gaudan, HaraldKirsch, and Antonio J. Yepes.Text processing through web services: Calling whatizit.Bioinformatics, pages btm557+, November 2007.

15 / 15