View
158
Download
1
Category
Tags:
Preview:
Citation preview
>10 km
too much to read
exponential growth
~40 seconds per paper
computer
as smart as a dog
teach it specific tricks
information retrieval
named entity recognition
information extraction
text/data integration
medical text mining
information retrieval
find the relevant papers
ad hoc retrieval
user-specified query
“yeast AND cell cycle”
PubMed
indexing
fast lookup
stemming
word endings
dynamic query expansion
MeSH terms
Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1
and this modification served as a priming step to promote subsequent
Cdc5-dependent Swe1 hyperphosphorylation and degradation
no tool will find that
named entity recognition
identify the concepts
Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1
and this modification served as a priming step to promote subsequent
Cdc5-dependent Swe1 hyperphosphorylation and degradation
comprehensive lexicon
CDC2
cyclin dependent kinase 1
orthographic variation
flexible matching
upper- and lower-case
CDC2
Cdc2
spaces and hyphens
cyclin dependent kinase 1
cyclin-dependent kinase 1
name expansions
prefixes and postfixes
CDC2
hCDC2
“black list”
SDS
efficient tagger
Pafilis et al., PLOS ONE, 2013
benchmarking
the formal way
manually annotated corpus
precision
recall
much work
the pragmatic way
random sampling
precision
no recall
much less work
augmented browsing
Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1
and this modification served as a priming step to promote subsequent
Cdc5-dependent Swe1 hyperphosphorylation and degradation
Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1
and this modification served as a priming step to promote subsequent
Cdc5-dependent Swe1 hyperphosphorylation and degradation
Reflect
reflect.ws
information extraction
formalize the facts
Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1
and this modification served as a priming step to promote subsequent
Cdc5-dependent Swe1 hyperphosphorylation and degradation
two approaches
the formal way
NLPNatural Language Processing
grammatical analysis
part-of-speech tagging
multiword detection
semantic tagging
sentence parsing
Gene and protein namesCue words for entity recognitionVerbs for relation extraction
[nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]]is controlled by[nxpg HAP1]
extract stated facts
high precision
poor recall
the pragmatic way
guilt by association
co-mentioning
counting
within documents
within paragraphs
within sentences
quality score
high recall
high precision
undirected associations
unknown type
text/data integration
STRING
protein associations
string-db.org
STITCH
STRING + 300k chemicals
stitch-db.org
COMPARTMENTS
subcellular localization
compartments.jensenlab.org
TISSUES
tissue expression
tissues.jensenlab.org
DISEASES
disease–gene assocations
diseases.jensenlab.org
curated knowledge
pathways
Letunic & Bork, Trends in Biochemical Sciences, 2008
experimental data
gene expression
computational predictions
gene neighborhood
Korbel et al., Nature Biotechnology, 2004
many databases
different formats
different identifiers
variable quality
not comparable
hard work
common identifiers
quality scores
score calibration
visualization
web interfaces
bulk download
why so many resources?
Swiss army knife syndrome
medical text mining
electronic health records
opt-out
opt-in
structured data
Jensen et al., Nature Reviews Genetics, 2012
unstructured data
clinical narrative
Danish
busy doctors
psychiatric patients
named entity recognition
custom dictionaries
diseases
drugs
adverse events
expansion rules
phonetic spelling
typos
sentence filters
negations
family members
delutions
detailed disease profiles
Roque et al., PLOS Computational Biology, 2011
3262638254947
Assigned codes
Text mined codes
comorbidity
Roque et al., PLOS Computational Biology, 2011
patient stratification
Roque et al., PLOS Computational Biology, 2011
pharmacovigilance
structured medication data
text-mined adverse events
Eriksson et al., submitted, 2013
EMBO Practical Course Computational Biology:Genomes to SystemsPuerto Varas, 3-9 April 2014
Thank you!
Thank you!
Recommended