Upload
lars-juhl-jensen
View
138
Download
0
Tags:
Embed Size (px)
DESCRIPTION
The pragmatic text miner: It's just another type of poorly standardized data
Citation preview
Lars Juhl Jensen
The pragmatic text miner
It’s just another type of poorly standardized data
why text mining?
data mining
guilt by association
structured data
unstructured text
biomedical literature
>10 km
too much to read
computer
as smart as a dog
teach it specific tricks
named entity recognition
text corpus
comprehensive lexicon
synonyms
expansion rules
prefixes and suffixes
flexible matching
hyphens and spaces
“black list”
a
co-mentioning
within documents
within paragraphs
within sentences
weighted score
unifying text & data
text mining
curated knowledge
experimental data
computational predictions
integrated web resources
protein networks
string-db.org
chemical networks
stitch-db.org
subcellular localization
compartments.jensenlab.org
tissue expression
tissues.jensenlab.org
disease associations
many sources
different formats
different identifiers
variable quality
hard work
collaboration model
domain experts
what?
why?
problem
manpower
me
how?
technology
guidance
biodiversity
organisms
environments
Encyclopedia of Life
British Heritage Library
what we need
the format is not important
the license is
AcknowledgmentsProtein networks
Michael KuhnDamian Szklarczyk
Andrea Franceschini Milan SimonovicAlexander RothSune Pletscher-
FrankildJianyi Lin
Pablo MinguezChristian von Mering
Peer Bork
Localization and diseaseSune Pletscher-FrankildAlberto SantosJanos BinderKalliopi TsafouChristian StolteAlbert PallejaHeiko HornEvangelos PafilisReinhardt SchneiderSean O’ Donoghue