Upload
jee-hyub-kim
View
176
Download
0
Embed Size (px)
Citation preview
Literature Services Resource Description Framework
Jee-Hyub Kim
Literature Services, EMBL-EBI
21 May 2015
1 / 15
Contents
1 Europe PMC and Linking Literature
2 Publishing Text-Mined Data on RDF
3 Text-Mining RDF Service
4 Discussion
2 / 15
Europe PMC
• Europe PMC is a literature database [1].
• Abstracts: 30 million PubMed, Agricola and patent records, updateddaily
• Full text articles: over 3 million full text articles, of which over 900,000are free to read and reuse, updated daily
• Powerful and easy search
• Search all article content through one simple search interface,supported by deep search options for advanced users.
3 / 15
Linking Literature
• Europe PMC provides various types of linking literature.
• External Links: to any (e.g., database, Wikipedia, press release, etc.)• Citations: to literature• BioEntities (produced by Europe PMC text-mining pipeline)
• Biological entities: to concept• Accession numbers: to data
• Example: http://europepmc.org/abstract/MED/21926972
4 / 15
Europe PMC Text-Mining Pipeline
• A pipeline of dictionary- and machine learning-based named entitytaggers [3].
• 6 semantic types
• Genes/proteins• Chemicals• Organisms• GO terms• Disease terms• EFO terms
• 20 accession numbers [2]:
• ENA, RefSNP, PDB, UniProt, OMIM, PFam, ArrayExpress, RefSeq,Data DOI, Ensembl, InterPro
• NCT, Bioproject, Biosample, Eudract, EMDB, PXD, GO, EGA,TreeFam
• Programmatic access available.
5 / 15
Publishing Text-Mined Data
• Beyond BioEntities Tab
• Goals
• More connectivity• More contexts for each linking• Links to share
• Challenge: dealing with nearly a billion annotations generatedautomatically in a large scale
• Using Web Annotation Data Model.
6 / 15
Web Annotation Data Model
• Built on the top on RDF
• Annotations as resources
• To provide a standard description mechanism for sharing annotationsbetween systems
• For more general purpose use
• Not only for text mining• For example, YouTube video comments (by people), image annotation,etc.
• W3C Working Draft:http://www.w3.org/TR/2014/WD-annotation-model-20141211/
7 / 15
Core Annotation Framework
• Typically an Annotation has a single Body, which is the comment orother descriptive resource, and a single Target that the Body issomehow "about".
• The Body provides the information which is annotating the Target.
• This "aboutness" may be further clari�ed or extended to notions suchas classifying or identifying.
8 / 15
One Scenario: Text Comment On Web Page
• A textual comment on a selection of text within a web page
• How to select a text fragment?
• Text Position Selector: oa:start, oa:end• Text Quote Selector: oa:exact, oa:pre�x, oa:post�x
9 / 15
Service Description
• Running on EBI RDF Platform
• Stores 1,563,241,810 triples text-mined from 400,746 Open Accessarticles in Europe PubMed Central.
• Provides
• for each article, all the annotations linking to ontologies/databases• with contexts:
• sentences• section information
12 / 15
Use Case for Database Curation
• Given an database identi�er, provides sentence-level information fordatabase curation.
1 Show all the articles where a PDB accession number 3NSS ismentioned.
2 Show all the annotations with each its label in PMC3382907.
3 Show all the articles where in�ammatory bowel disease (C0021390) ismentioned.
• http://wwwdev.ebi.ac.uk/rdf/services/textmining/sparql
13 / 15
Discussion
• Can we deal with a large number of triples from 3 million full textarticles?
• A better URI scheme: e.g.,http://europepmc.org/articles/PMC4298172/methods/genes/TEM-1/23
• Interoperability with other formats used in text-mining community
• e.g., BioC, UIMA
• Questions?
14 / 15
References
The Europe PMC Consortium.Europe pmc: a full-text literature database for the life sciences andplatform for innovation.Nucleic Acids Research, 2014.
Senay Kafkas, Jee-Hyub Kim, and Johanna R. McEntyre.Database citation in full text biomedical articles.PLoS ONE, 8(5):e63184, 05 2013.
Dietrich Rebholz-Schuhmann, Miguel Arregui, Sylvain Gaudan, HaraldKirsch, and Antonio J. Yepes.Text processing through web services: Calling whatizit.Bioinformatics, pages btm557+, November 2007.
15 / 15