Upload
lars-juhl-jensen
View
133
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Mining the biomedical literature: Dictionary-based identification of proteins in text
Citation preview
Lars Juhl Jensen
Mining the biomedical literature
Dictionary-based identification of proteins in text
>10 km
too much to read
computer
as smart as a dog
teach it specific tricks
named entity recognition
text corpus
~2 million full-text articles
PubMed Central OA
freely available from journals
~22 million abstracts
Medline
comprehensive lexicon
genes and proteins
CDC2
cyclin dependent kinase 1
expansion rules
prefixes and suffixes
hCdc2
CDC2
flexible matching
spaces and hyphens
cyclin-dependent kinase 1
cyclin dependent kinase 1
“black list”
SDS
fast efficient software
publication count
reviews and ‘omics studies
document weight
occurrences of protein Xoccurrences of any protein
Median MeanTclin 295 864.8Tchem 190 393.5Tmacro
39 154.2
Tdark 0 11.1
cooccurrences
diseases
tissues
cellular compartments
?