Upload
lars-juhl-jensen
View
335
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Nordic Conference for Scolarly Communication 2008, Scandic Star Hotel, Lund, Sweden, April 21-23, 2008
Citation preview
Integration of biomedical literature and databases
Lars Juhl JensenEMBL Heidelberg
why integration?
why biomedicine?
why literature?
why databases?
open access databases
a lot of them
Duncan Hull, nodalpoint.org
PubChem
19.2 million compounds
GenBank
85 million sequences
89 billion nucleotides
UniProt
5.6 million sequences
PDB
50000 protein structures
BINDBiomolecular Interaction Network Database
DIPDatabase of Interacting Proteins
MINTMolecular Interactions Database
IntAct
BioGRID
204000 interactions
too many
incomplete
literature mining
MEDLINE
17.9 million citations
too much to read
information retrieval
finding the papers
user-specified query
“yeast AND cell cycle”
stemming
yeast / yeasts
dynamic query expansion
yeast / S. cerevisiae
ranking
Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1
hyperphosphorylation and degradation
no tool will find it
entity recognition
identifying the substance(s)
Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1
hyperphosphorylation and degradation
Cdc28 yeast
Cdc28 cell cycle
synonyms list
orthographic variation
CDC28
Cdc28p
disambiguation
Cdc2
SDS
still too much to read
information extraction
formalizing the facts
co-mentioning
statistical methods
NLPNatural Language Processing
Gene and protein names
Cue words for entity recognition
Verbs for relation extraction
[nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]]is controlled by[nxpg HAP1]
Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1
hyperphosphorylation and degradation
yet another database
integration
augmented browsing
semantic tagging
association networks
curated knowledge
genomic context
phylogenetic profiles
gene neighborhood
experimental data
physical interactions
genetic interactions
literature mining
restricted access
Bayesian framework
summary
literature mining is good
data integration is better
open access
Acknowledgments
STRING & STITCH– Christian von Mering
– Michael Kuhn
– Manuel Stark
– Samuel Chaffron
– Philippe Julien
– Tobias Doerks
– Jan Korbel
– Berend Snel
– Martijn Huynen
– Peer Bork
Reflect– Evangelos Pafilis
– Michael Kuhn
– Sean O’Donoghue
– Reinhardt Schneider
Natural Language Processing– Jasmin Saric
– Rossitza Ouzounova
– Isabel Rojas
– Peer Bork