105
Open access – making the most of biomedical literature mining Lars Juhl Jensen EMBL Heidelberg

Open access - making the most of biomedical literature mining

Embed Size (px)

DESCRIPTION

Open Access indenfor naturvidenskaberne, Copenhagen University Library, Copenhagen, Denmark, September 8, 2006

Citation preview

Open access – making the most of biomedical literature mining

Lars Juhl JensenEMBL Heidelberg

why open access?

why biomedicine?

why literature mining?

MEDLINE

Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1

hyperphosphorylation and degradation

information retrieval

finding the papers

if you can’t find them …

… they don’t exist!

ad hoc retrieval

users-specified query

“yeast AND cell cycle”

Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1

hyperphosphorylation and degradation

MEDLINE

abstracts

complete papers

tricks

stemming

yeast / yeasts

synonyms

yeast / S. cerevisiae

dynamic query expansion

next logical step

ontologies

annotation

Cdc28 yeast gene

Cdc28 cell cycle

Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1

hyperphosphorylation and degradation

“yeast AND cell cycle”

entity recognition

identifying the substance(s)

Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1

hyperphosphorylation and degradation

if you can’t find them …

… they don’t exist!

abstracts

MEDLINE

tricks

good synonyms list

manual curation

orthographic variation

CDC28

Cdc28p

disambiguation

hairy

SDS

Cdc2

information extraction

formalizing the facts

co-mentioning

Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1

hyperphosphorylation and degradation

NLPNatural Language Processing

Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1

hyperphosphorylation and degradation

Gene and protein names

Cue words for entity recognition

Verbs for relation extraction

[nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]]is controlled by[nxpg HAP1]

new discoveries

text mining

temporal trends

buzzwords

grant applications

global correlations

3279 83

3592

Regulates Regulated

P < 910-9

transcriptional networks

1127 44

3704

Phosphorylates Phosphorylated

P < 210-7

signal cascades

8107 47

3625

Expression Phosphorylation

P < 510-4

integration of text and data

network mining

linking genes to diseases

multifactorial diseases

genotype to phenotype

where are we now?

abstracts

complete papers

restricted access

open access

the tools are there

now we need the text!

Acknowledgments

Jasmin Saric

Rossitza Ouzounova

Michael Kuhn

Isabel Rojas

Miguel Andrade

Peer Bork