Mining the biomedical literature: Dictionary-based identification of proteins in text

  • View
    133

  • Download
    0

  • Category

    Science

Preview:

DESCRIPTION

Mining the biomedical literature: Dictionary-based identification of proteins in text

Citation preview

Lars Juhl Jensen

Mining the biomedical literature

Dictionary-based identification of proteins in text

>10 km

too much to read

computer

as smart as a dog

teach it specific tricks

named entity recognition

text corpus

~2 million full-text articles

PubMed Central OA

freely available from journals

~22 million abstracts

Medline

comprehensive lexicon

genes and proteins

CDC2

cyclin dependent kinase 1

expansion rules

prefixes and suffixes

hCdc2

CDC2

flexible matching

spaces and hyphens

cyclin-dependent kinase 1

cyclin dependent kinase 1

“black list”

SDS

fast efficient software

publication count

reviews and ‘omics studies

document weight

occurrences of protein Xoccurrences of any protein

Median MeanTclin 295 864.8Tchem 190 393.5Tmacro

39 154.2

Tdark 0 11.1

cooccurrences

diseases

tissues

cellular compartments

?

Recommended