Upload
lars-juhl-jensen
View
272
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Advanced bioinformaticsof proteomics datasets
Citation preview
Advanced bioinformaticsof proteomics datasets
Lars Juhl Jensen
three parts
signaling networks
association networks
text mining
signaling networks
MS-based proteomics
Linding, Jensen, Ostheimer et al., Cell, 2007
in vivo PTM sites
actors are unknown
sequence specificity
logo plots
Miller, Jensen et al., Science Signaling, 2008
motif collections
Eukaryotic Linear Motifs
regular expressions
[VILMAFP](K).E
NetPhorest
machine learning
kinases
phospho-binding proteins
phosphatases
Miller, Jensen et al., Science Signaling, 2008
possible actors
no cellular context
co-activators
protein scaffolds
localization
expression
association network
Linding, Jensen, Ostheimer et al., Cell, 2007
NetworKIN
Linding, Jensen, Ostheimer et al., Cell, 2007
web interface
association networks
guilt by association
STRING
2000+ genomes
computational predictions
gene fusion
Korbel et al., Nature Biotechnology, 2004
gene neighborhood
Korbel et al., Nature Biotechnology, 2004
phylogenetic profiles
Korbel et al., Nature Biotechnology, 2004
experimental data
gene coexpression
protein interactions
Jensen & Bork, Science, 2008
curated knowledge
pathways
Letunic & Bork, Trends in Biochemical Sciences, 2008
many databases
different formats
different identifiers
variable quality
not comparable
not same species
hard work
(Ph.D. students)
parsers
common identifiers
clever ideas
quality assessment
scoring schemes
affinity purification
von Mering et al., Nucleic Acids Research, 2005
score calibration
gold standard
von Mering et al., Nucleic Acids Research, 2005
implicit weighting by quality
common scale
interologs
homology-based transfer
orthologous groups
Franceschini et al., Nucleic Acids Research, 2013
missing most of the data
text mining
>10 km
too much to read
exponential growth
~40 seconds per paper
computer
as smart as a dog
teach it specific tricks
named entity recognition
comprehensive lexicon
CDC2
cyclin dependent kinase 1
orthographic variation
expansion rules
prefixes and suffixes
CDC2
hCdc2
flexible matching
spaces and hyphens
cyclin dependent kinase 1
cyclin-dependent kinase 1
“black list”
SDS
information extraction
co-mentioning
counting
within documents
within paragraphs
within sentences
scoring scheme
score calibration
natural language processing
grammatical analysis
what you learned in schoolpronoun pronoun verb preposition noun
Gene and protein namesCue words for entity
recognitionVerbs for relation extraction
[nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]]is controlled by[nxpg HAP1]
complex sentences
summary
computational approaches
data integration
text mining
scoring schemes
general approach
protein networks
string-db.org
chemical networks
stitch-db.org
subcellular localization
compartments.jensenlab.org
tissue expression
tissues.jensenlab.org
disease associations