135
Advanced bioinformatics of proteomics datasets Lars Juhl Jensen

Advanced bioinformaticsof proteomics datasets

Embed Size (px)

DESCRIPTION

Advanced bioinformatics of proteomics datasets

Citation preview

Page 1: Advanced bioinformaticsof proteomics datasets

Advanced bioinformaticsof proteomics datasets

Lars Juhl Jensen

Page 2: Advanced bioinformaticsof proteomics datasets

three parts

Page 3: Advanced bioinformaticsof proteomics datasets

signaling networks

Page 4: Advanced bioinformaticsof proteomics datasets

association networks

Page 5: Advanced bioinformaticsof proteomics datasets

text mining

Page 6: Advanced bioinformaticsof proteomics datasets

signaling networks

Page 7: Advanced bioinformaticsof proteomics datasets

MS-based proteomics

Page 8: Advanced bioinformaticsof proteomics datasets

Linding, Jensen, Ostheimer et al., Cell, 2007

Page 9: Advanced bioinformaticsof proteomics datasets

in vivo PTM sites

Page 10: Advanced bioinformaticsof proteomics datasets

actors are unknown

Page 11: Advanced bioinformaticsof proteomics datasets

sequence specificity

Page 12: Advanced bioinformaticsof proteomics datasets

logo plots

Page 13: Advanced bioinformaticsof proteomics datasets

Miller, Jensen et al., Science Signaling, 2008

Page 14: Advanced bioinformaticsof proteomics datasets

motif collections

Page 15: Advanced bioinformaticsof proteomics datasets

Eukaryotic Linear Motifs

Page 16: Advanced bioinformaticsof proteomics datasets

regular expressions

Page 17: Advanced bioinformaticsof proteomics datasets

[VILMAFP](K).E

Page 18: Advanced bioinformaticsof proteomics datasets

NetPhorest

Page 19: Advanced bioinformaticsof proteomics datasets

machine learning

Page 20: Advanced bioinformaticsof proteomics datasets

kinases

Page 21: Advanced bioinformaticsof proteomics datasets

phospho-binding proteins

Page 22: Advanced bioinformaticsof proteomics datasets

phosphatases

Page 23: Advanced bioinformaticsof proteomics datasets

Miller, Jensen et al., Science Signaling, 2008

Page 24: Advanced bioinformaticsof proteomics datasets

possible actors

Page 25: Advanced bioinformaticsof proteomics datasets

no cellular context

Page 26: Advanced bioinformaticsof proteomics datasets

co-activators

Page 27: Advanced bioinformaticsof proteomics datasets

protein scaffolds

Page 28: Advanced bioinformaticsof proteomics datasets

localization

Page 29: Advanced bioinformaticsof proteomics datasets

expression

Page 30: Advanced bioinformaticsof proteomics datasets

association network

Page 31: Advanced bioinformaticsof proteomics datasets

Linding, Jensen, Ostheimer et al., Cell, 2007

Page 32: Advanced bioinformaticsof proteomics datasets

NetworKIN

Page 33: Advanced bioinformaticsof proteomics datasets

Linding, Jensen, Ostheimer et al., Cell, 2007

Page 34: Advanced bioinformaticsof proteomics datasets

web interface

Page 35: Advanced bioinformaticsof proteomics datasets
Page 36: Advanced bioinformaticsof proteomics datasets

association networks

Page 37: Advanced bioinformaticsof proteomics datasets

guilt by association

Page 38: Advanced bioinformaticsof proteomics datasets
Page 39: Advanced bioinformaticsof proteomics datasets

STRING

Page 40: Advanced bioinformaticsof proteomics datasets

2000+ genomes

Page 41: Advanced bioinformaticsof proteomics datasets

computational predictions

Page 42: Advanced bioinformaticsof proteomics datasets

gene fusion

Page 43: Advanced bioinformaticsof proteomics datasets

Korbel et al., Nature Biotechnology, 2004

Page 44: Advanced bioinformaticsof proteomics datasets

gene neighborhood

Page 45: Advanced bioinformaticsof proteomics datasets

Korbel et al., Nature Biotechnology, 2004

Page 46: Advanced bioinformaticsof proteomics datasets

phylogenetic profiles

Page 47: Advanced bioinformaticsof proteomics datasets

Korbel et al., Nature Biotechnology, 2004

Page 48: Advanced bioinformaticsof proteomics datasets

experimental data

Page 49: Advanced bioinformaticsof proteomics datasets

gene coexpression

Page 50: Advanced bioinformaticsof proteomics datasets
Page 51: Advanced bioinformaticsof proteomics datasets

protein interactions

Page 52: Advanced bioinformaticsof proteomics datasets

Jensen & Bork, Science, 2008

Page 53: Advanced bioinformaticsof proteomics datasets

curated knowledge

Page 54: Advanced bioinformaticsof proteomics datasets

pathways

Page 55: Advanced bioinformaticsof proteomics datasets

Letunic & Bork, Trends in Biochemical Sciences, 2008

Page 56: Advanced bioinformaticsof proteomics datasets

many databases

Page 57: Advanced bioinformaticsof proteomics datasets

different formats

Page 58: Advanced bioinformaticsof proteomics datasets

different identifiers

Page 59: Advanced bioinformaticsof proteomics datasets

variable quality

Page 60: Advanced bioinformaticsof proteomics datasets

not comparable

Page 61: Advanced bioinformaticsof proteomics datasets

not same species

Page 62: Advanced bioinformaticsof proteomics datasets

hard work

Page 63: Advanced bioinformaticsof proteomics datasets

(Ph.D. students)

Page 64: Advanced bioinformaticsof proteomics datasets

parsers

Page 65: Advanced bioinformaticsof proteomics datasets

common identifiers

Page 66: Advanced bioinformaticsof proteomics datasets

clever ideas

Page 67: Advanced bioinformaticsof proteomics datasets

quality assessment

Page 68: Advanced bioinformaticsof proteomics datasets

scoring schemes

Page 69: Advanced bioinformaticsof proteomics datasets

affinity purification

Page 70: Advanced bioinformaticsof proteomics datasets

von Mering et al., Nucleic Acids Research, 2005

Page 71: Advanced bioinformaticsof proteomics datasets

score calibration

Page 72: Advanced bioinformaticsof proteomics datasets

gold standard

Page 73: Advanced bioinformaticsof proteomics datasets

von Mering et al., Nucleic Acids Research, 2005

Page 74: Advanced bioinformaticsof proteomics datasets

implicit weighting by quality

Page 75: Advanced bioinformaticsof proteomics datasets

common scale

Page 76: Advanced bioinformaticsof proteomics datasets

interologs

Page 77: Advanced bioinformaticsof proteomics datasets

homology-based transfer

Page 78: Advanced bioinformaticsof proteomics datasets

orthologous groups

Page 79: Advanced bioinformaticsof proteomics datasets

Franceschini et al., Nucleic Acids Research, 2013

Page 80: Advanced bioinformaticsof proteomics datasets

missing most of the data

Page 81: Advanced bioinformaticsof proteomics datasets

text mining

Page 82: Advanced bioinformaticsof proteomics datasets

>10 km

Page 83: Advanced bioinformaticsof proteomics datasets

too much to read

Page 84: Advanced bioinformaticsof proteomics datasets

exponential growth

Page 85: Advanced bioinformaticsof proteomics datasets

~40 seconds per paper

Page 86: Advanced bioinformaticsof proteomics datasets

computer

Page 87: Advanced bioinformaticsof proteomics datasets

as smart as a dog

Page 88: Advanced bioinformaticsof proteomics datasets

teach it specific tricks

Page 89: Advanced bioinformaticsof proteomics datasets
Page 90: Advanced bioinformaticsof proteomics datasets
Page 91: Advanced bioinformaticsof proteomics datasets

named entity recognition

Page 92: Advanced bioinformaticsof proteomics datasets

comprehensive lexicon

Page 93: Advanced bioinformaticsof proteomics datasets

CDC2

Page 94: Advanced bioinformaticsof proteomics datasets

cyclin dependent kinase 1

Page 95: Advanced bioinformaticsof proteomics datasets

orthographic variation

Page 96: Advanced bioinformaticsof proteomics datasets

expansion rules

Page 97: Advanced bioinformaticsof proteomics datasets

prefixes and suffixes

Page 98: Advanced bioinformaticsof proteomics datasets

CDC2

Page 99: Advanced bioinformaticsof proteomics datasets

hCdc2

Page 100: Advanced bioinformaticsof proteomics datasets

flexible matching

Page 101: Advanced bioinformaticsof proteomics datasets

spaces and hyphens

Page 102: Advanced bioinformaticsof proteomics datasets

cyclin dependent kinase 1

Page 103: Advanced bioinformaticsof proteomics datasets

cyclin-dependent kinase 1

Page 104: Advanced bioinformaticsof proteomics datasets

“black list”

Page 105: Advanced bioinformaticsof proteomics datasets

SDS

Page 106: Advanced bioinformaticsof proteomics datasets

information extraction

Page 107: Advanced bioinformaticsof proteomics datasets

co-mentioning

Page 108: Advanced bioinformaticsof proteomics datasets

counting

Page 109: Advanced bioinformaticsof proteomics datasets

within documents

Page 110: Advanced bioinformaticsof proteomics datasets

within paragraphs

Page 111: Advanced bioinformaticsof proteomics datasets

within sentences

Page 112: Advanced bioinformaticsof proteomics datasets

scoring scheme

Page 113: Advanced bioinformaticsof proteomics datasets
Page 114: Advanced bioinformaticsof proteomics datasets
Page 115: Advanced bioinformaticsof proteomics datasets

score calibration

Page 116: Advanced bioinformaticsof proteomics datasets

natural language processing

Page 117: Advanced bioinformaticsof proteomics datasets

grammatical analysis

Page 118: Advanced bioinformaticsof proteomics datasets

what you learned in schoolpronoun pronoun verb preposition noun

Page 119: Advanced bioinformaticsof proteomics datasets

Gene and protein namesCue words for entity

recognitionVerbs for relation extraction

[nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]]is controlled by[nxpg HAP1]

Page 120: Advanced bioinformaticsof proteomics datasets

complex sentences

Page 121: Advanced bioinformaticsof proteomics datasets

summary

Page 122: Advanced bioinformaticsof proteomics datasets

computational approaches

Page 123: Advanced bioinformaticsof proteomics datasets

data integration

Page 124: Advanced bioinformaticsof proteomics datasets

text mining

Page 125: Advanced bioinformaticsof proteomics datasets

scoring schemes

Page 126: Advanced bioinformaticsof proteomics datasets

general approach

Page 127: Advanced bioinformaticsof proteomics datasets

protein networks

Page 128: Advanced bioinformaticsof proteomics datasets

string-db.org

Page 129: Advanced bioinformaticsof proteomics datasets

chemical networks

Page 130: Advanced bioinformaticsof proteomics datasets

stitch-db.org

Page 131: Advanced bioinformaticsof proteomics datasets

subcellular localization

Page 132: Advanced bioinformaticsof proteomics datasets

compartments.jensenlab.org

Page 133: Advanced bioinformaticsof proteomics datasets

tissue expression

Page 134: Advanced bioinformaticsof proteomics datasets

tissues.jensenlab.org

Page 135: Advanced bioinformaticsof proteomics datasets

disease associations