DNAGenomics RNAGenomics/Transcriptomics ProteinProteomics
MetabolitesMetabolomics The Central Dogma-omics
Slide 2
Protein Machines Key Concept: Biochemical functions are carried
out by multi-protein machines The polyAdenylation Machinery The
Proteosome Key Concept: A Protein Function can be inferred by its
binding partners Key Concept: Knowledge of a Machines components is
required to understand how it works and how it is regulated
Slide 3
Key Concept: Highly Clustered areas typically serve the same
biological function. Protein Machines Interaction with each other
in Higher order Networks
Slide 4
Key Concept: Complex phenotypes can be understood in a network
context Understanding the Network May Give Insights into Emergent
Behaviors -Homeostasis -Robustness -Periodicity -Morphogenesis
-Tumorigenesis
Slide 5
Proteins Are Organized in a Small World Network Key Concept:
The proteome is HIGHLY Networked
Slide 6
The Small World Hypothesis: Six Degrees of Separation Stanley
Milgram study in 1967 -put ads in newspapers in Nebraska and Kansas
asking for volunteers for an experiment. The volunteers were asked
to contact a divinity student in Boston by going through people
that they new on a first name basis who would then contact their
friends and so on. -the number of people (degrees) be- tween the
volunteers and the target ranged between 2 and 10 with the mean
being 6.
Slide 7
Properties of Small World Networks: -highly clustered: my
friends are also friends -most nodes are not connected: most people
are strangers -presence of hubs (nodes with a lot of connections):
Facebook Whales -can find a short path between any two nodes. Two
strangers meet and realize they know some of the same person This
path is often referred to as the degree of separation -network
should be resistant to pertub- ation: Life goes on
Slide 8
Clustered vs Non-Clustered
Slide 9
Number of Links (k) Number of nodes with k links Distribution
of Connections 80/20 Law
Slide 10
Ten best Centers 1. CLU11.843 2. CDC331.867 3. TIF21.875 4.
MDH11.898 5. SRP11.912 6. YBL004W1.914 7. RPT31.914 8. HAS11.914 9.
YGR090W1.917 10. PFK11.918 Ten Worst Centers CAC23.803 PSR13.838
RAM23.840 RAM13.840 ORC23.863 UBA33.902 MAK103.975 YNL056W4.003
YNR046W4.089 VPS44.433 Median Degree of Separation : 2.38 Shortest
and longest Pathways
Slide 11
Is S. cerevisae Robust??? -Environmentally Robust -Robust to
temperature (4-40 C) -Robust to Nutrient Sources -Robust to
Starvation -Robust to Osmolarity (0-1 M NaCl) -Is it Robust to
Genetic Perturbation (mutation)??? -S. cerevisiae Genome Deletion
Project has deleted 95% of all S. cereviae genes -18.7% of genes
are essential -in a typical small world network you can lose ~20%
of all nodes before the network crashes.
Slide 12
Ten best Centers 1. CLU11.843 2. CDC331.867 3. TIF21.875 4.
MDH11.898 5. SRP11.912 6. YBL004W1.914 7. RPT31.914 8. HAS11.914 9.
YGR090W1.917 10. PFK11.918 Ten Worst Centers CAC23.803 PSR13.838
RAM23.840 RAM13.840 ORC23.863 UBA33.902 MAK103.975 YNL056W4.003
YNR046W4.089 VPS44.433 Median Degree of Separation : 2.38 Is there
any biology behind the network hypothesis? Essential ORF deletions
are only available as heterozygous diploids, while non-essential
ORF deletions are available as haploids, homozygous diploids and
heterozygous diploids.
Slide 13
Ten best Centers 1. CLU11.843 2. CDC331.867 3. TIF21.875 4.
MDH11.898 5. SRP11.912 6. YBL004W1.914 7. RPT31.914 8. HAS11.914 9.
YGR090W1.917 10. PFK11.918 Ten Worst Centers CAC23.803 PSR13.838
RAM23.840 RAM13.840 ORC23.863 UBA33.902 MAK103.975 YNL056W4.003
YNR046W4.089 VPS44.433 Median Degree of Separation #: 2.38 Is there
any biology behind the network hypothesis? Key Concept:
Connectivity and essentiality are correlated. Essential ORF
deletions are only available as heterozygous diploids, while
non-essential ORF deletions are available as haploids, homozygous
diploids and heterozygous diploids.
Slide 14
Evolutionary Effects of Connectedness -Connected genes are non
randomly distributed in the genome -Connected genes are less likely
to undergo duplication -Connected genes are less likely to have
close homologs -Connected genes are less likely to have
introns
Slide 15
Evolutionary Effects of Connectedness
Slide 16
Is S. cerevisae Robust??? -Environmentally Robust -Robust to
temperature (4-40 C) -Robust to Nutrient Sources -Robust to
Starvation -Robust to Osmolarity (0-1 M NaCl) -Is it Robust to
Genetic Perturbation (mutation)??? -S. cerevisiae Genome Deletion
Project has deleted 95% of all S. cereviae genes -18.7% of genes
are essential Is Cancer a Robust Network -Environmentally Robust
-It Lives under a constant state of genomic stress
Slide 17
Summary -Proteins are organized in functional units (machines)
-these machines do virtually all the work in the cell
-understanding the components of a machine is critical for
functionally annotating the genome -understanding the components of
a machine is critical for determining how a machine is regulated
-the effects of mutation are great at this level -Protein Machines
are organized into higher order Networks -the Network architecture
has left its imprint on evolution -the Network is likely to be
rewired under pathological pathological conditions -especially in
the case of cancer -understanding the Network is important for
understanding the complex behavior of the system Key Concept: High
Throughput mapping of protein:protein interactions will provide
important insights into human biology
Slide 18
Understanding the Network Requires a lot of Information
-Direction of Information -Sign -Magnitude -Timing
Slide 19
Understanding the Network Requires a lot of Information
-Direction of Information -Sign -Magnitude -Timing
Slide 20
Understanding the Network Requires a lot of Information
-Direction of Information -Sign -Magnitude -Timing
Slide 21
Understanding the Network Requires a lot of Information
-Direction of Information -Sign -Magnitude -Timing
Slide 22
Understanding the Network Requires a lot of Information
-Direction of Information -Sign -Magnitude -Timing
Slide 23
Approaches for Mapping Protein:Protein Interactions -Mapping by
Inference: -if two proteins interact in one organism than they
interact in other organisms. -can be extended to domains/motifs as
well -if two proteins are coregulated on microarrays they are
likely to interact -Direct Mapping: -In vitro binding experiment
-Genetic Screen/Trap -Yeast 2-hybrid assay -Affinity
Co-purifications -IP:Western blot -IP:Mass Spectrometry
Slide 24
Interactomics by Genetic Screens Key Concept: Genetic
Complementation allows the identification of direct (binary)
interactions. Uetz et al 2001
Slide 25
Interactomics by Genetic Screens Key Concept: No matter how
good something isthere are always problems. Advantages of Genetic
Complementation: -can do genome scale screening -quick -cheap
-adaptable -works best when the screen is based on selection
Problems of Genetic Complementation: -sensitive to dynamic range
-protein interaction may be incompatible with the complementation
scheme -can not perturb the system -more false positives than true
positives
Slide 26
Affinity Governs the formation of Protein Complexes Affinity is
Determined by the shapes of the proteins and how well they fit
together. -hydrophobic interactions -ionic interactions -hydrogen
bonding Affinity is usually expressed as K d which is the [ ] that
results in equivalent [ ] and [ ]. Implicitly, there is usually a
mixture of free and complexed components and this ratio is []
dependent. + K d = [ ] x [ ] []
Slide 27
Affinity Governs the formation of Protein Complexes + A weak
interaction may only form if the concentration is high enough.
+
Slide 28
Interactomics by Co-purification Key Concept: Interacting
proteins will co-purify Tap Tagging: Rigaut et al. Nat. Biotech.
1999.
Slide 29
Interactomics by Co-purification Advantages of Co-purification:
-proteins isolated from their native source -the system can be
perturbed Problems of Co-purification: -sensitive to dynamic range
-real interactions may be lost during purification -can be
difficult to purify the target protein -no amplification -need a
way to identify the co-purifying proteins
Slide 30
antibody bead IP
Slide 31
antibody bead trypsin digest direct antibody bead IP Key
Concept: Cutting out steps is one of the hallmarks of high through
put approaches. This increases the through put and usually also
increases the sensitivity.
Slide 32
antibody bead trypsin digest directly from beads antibody bead
IP Affinity Purification - coIP
Slide 33
Native Antibody is Resistant to Trypsin
Slide 34
Reduced/Denatured Antibody is Sensitive to Trypsin
Slide 35
antibody bead IP Key Concept: Complex mixtures can not be
manually interpreted. The average protein generates ~50 proteolytic
fragments..so you will have 1000s and 1000s to interpret. NS
Slide 36
Sources of Non-Specific Binding -Not enough washing. -Biofluids
have a high dynamic range so you must wash away the super abundant
stuff to see the less concentrated proteins -Proteins that stick to
the beads -Proteins that stick to the antibodies on the beads
-Proteins that stick to the wall of the tube -Proteins that stick
to your complex of interest -Proteins that are real binders but are
biologically irrelevant
Slide 37
Nonspecific Binding is Reproducible
Slide 38
Slide 39
Are All Protein Complexes Biologically Relevant? An interaction
will be selected for if it is beneficial. An interaction will be
selected against if it is detrimental. What happens if the
interaction is neither beneficial nor detrimental? What would be
the cost of allowing only beneficial interactions? +
Slide 40
Slide 41
Protein ID by Mass Spectrometry
Slide 42
MultiDimensional Chromatography (MuDPIT)
Slide 43
10-100 Proteins (6 hours) 100-300 Proteins (2 hours) 1000-6000
Proteins (10 hours) Comparison of Three Analysis Techniques on
Lysates
Slide 44
53 Proteins (6 hours) 76 Proteins (2 hours) 82 Proteins (10
hours) Comparison of Three Analysis Techniques on IPs
Slide 45
Protein ID by Mass Spectrometry ~10,000 MS/MS per hour Key
Concept: LC-MS/MS workflows can not be manually interpreted
matched peaks (y/b ions) 100% 0% spectrum intensities
predicted? (1,0) Compute a Correlation Score
Slide 48
spectrum intensities predicted? (1,0) The Truth about Spectral
Matching -Spectral matching produces an answer for every spectra,
even those that are artifacts. -Experimental spectra always deviate
from theoretical spectra. -A high correlation score is not a
guarantee that it is correct. -Peptide must be in the database in
order to be found.
Slide 49
Peptide ID by Mass Spectrometry
Slide 50
Peptide IDs can be clustered into Protein IDs
Slide 51
Mapping Peptides to Proteins is NOT easy!
Slide 52
Single Proteins to Protein Lists
Slide 53
How do you know which matches to trust???
Slide 54
spectrum intensities predicted? (1,0) The Truth about Spectral
Matching -Spectral matching produces an answer for every spectra,
even those that are clearly artifacts. -Experimental spectra always
deviate from theoretical spectra. -A high correlation score is not
a guarantee that it is correct. -Peptide must be in the database in
order to be found. -An e value is easily calculated using the
~11,000 incorrect peptides. -The false discovery rate is easily
calculated using a decoy database Key Concept: Statistics are
required for the proper interpretation of MS/MS data.
Slide 55
hyperscore # results incorrect IDs Histogram of Correlation
Scores Highest scoring match is assumed to be correct
hyperscore log(# results) E-value=10 -8.2 Interpreting E-values
E-value is the number of matches you expect to find at random,
given the search parameters. Or the chance of getting a match this
good from this spectrum by random chance. So this would be a chance
of 10 -8.2 or 1 in ~150,000,000.
Slide 59
hyperscore log(# results) E-value=10 -8.2 Interpreting E-values
v E-value=10 -3.9 Changing the search parameters will change the
statistics. Allowing many post-translational modifications and/or
amino acid substitutions, using a very large database, allowing
large mass errors, etc. Can use a decoy or false database to verify
the statistical models being used.
Slide 60
Use a Decoy Database to Determine False Discovery Rate A good
decoy database: -does not contain any correct hits -is the same
size as the Query database -has the same distribution of amino
acids -has the same size distribution of proteolytic fragments
(peptides) -can be reproduced by other labs Reversed databases
solve most of these constraints -so the protein RSAMPLER digested
with trypsin gives: -SAMPLER (forward) -RELPMASR ELPMASR (reverse)
-using the reverse typically only gives problems with palindromic
sequences.which thankfully are rare*. *except in viruses!
Slide 61
Use a Decoy Database to Determine False Discovery Rate By
definition: Everything from the Reversed database is
incorrect!
Slide 62
Use a Decoy Database to Determine Global False Discovery Rates
< 0.5% < 1% < 1.5% < 2.0% < 2.3% < 2.6%
Slide 63
Forward Reverse Number of spectra in each bin Calculating a
local False Discovery Rate 0 -1 -1.5 -3.0 -4.0
Slide 64
Predicting phosphorylation sites (low confidence)(high
confidence) Incorrect n = 51 S118 n = 6 S153 n = 54
Slide 65
Using FDR to choose the best Database
Slide 66
Amino Acid Substitutions can be modeled using Statistics
Slide 67
Allowing for Amino Acid Substitutions 117 vs 178 proteins
Slide 68
So now what do I do?
Slide 69
Interpreting Protein Lists Comparison based on Gene Ontology
(GO) ControlExperiment
Slide 70
Comparison based on Gene Ontology (GO) ControlExperiment
Interpreting Protein Lists
Slide 71
Functional Clustering of Protein Lists
Slide 72
Domain Enrichment
Slide 73
Enrichment of Components from known Pathways
Slide 74
Build Networks
Slide 75
http://string-db.org/
Slide 76
Other Resources DAVID: http://david.abcc.ncifcrf.gov/ GO based
enrichment. Clustering of redundant GO terms. Mapping onto KEGG
pathways. Mapping onto disease pathways, etc. Biogrid:
thebiogrid.org An Online Interaction Respository With Data Compiled
Through Comprehensive Curation Efforts. MIPS:
http://mips.helmholtz-muenchen.de/proj/ppi/ Manually curated
protein protein interaction database
Slide 77
Summary Proteomics: -large scale analysis of proteins (really
peptides) -statistical analysis is required for interpretation -can
be used to address a wide range of biological problems -best used
to answer discrete questions -things that can not be answered by
genomic techniques -protein complexes -protein modification -other
post translational events -change is subcellular localization
-question will help determine which hits are chosen for validation
Feel free to email me questions: [email protected]
Slide 78
Ubiquitin: Short protein that is covalently attached to other
proteins 7 lysine residues all can form poly-Ub chains K48 chains
involved in proteosmal degradation K63 chains involved in signaling
K11 chains ???? K6 chains (DNA damage) K29 chains ????
Slide 79
K6-Ub Pulldown HA ? K6-Ub HA ? ? ? Tagged ubiquitin with only
one available lysine. Pull down K6-linked poly-Ub chain. Identify
proteins. K6 chains are assembled by BRCA1
Slide 80
K6-Ubiquitin Pulldown
Slide 81
Hit Criteria Not in the control sample Not a commonly known
contaminant Good score (more than one peptide) - Expressed as an
False Discovery Rate Seen in repeat experiments
Slide 82
4959 4079 Potential Hits Ub and Ub- binding proteins Excluded:
heat shock hnRNP ribosomal keratin histones Proteins also found in
the control IP Results from a representative non-denaturing
K6-ubiquitin IP
Slide 83
Ubiquitin-binding proteins potential hits good scores high
confidence poor scores low confidence
Slide 84
20 22233 Overlap Between K6 and K63 Pulldowns UB-K6UB-K63
Slide 85
Werners helicase interacting protein 1 (WHIP1) good scores high
confidence poor scores low confidence
Slide 86
Rad18- like Zn + finger AAA+ ATPase WHIP domain architecture
Does not contain any recognizable Ubiquitin binding domain
Slide 87
Why does WHIP co-IP with ubiquitin? WHIP is ubiquitinated WHIP
is a ubiquitin-binding protein WHIP UbUb UbUb UbUb UbUb UbUb UbUb
UbUb Covalent bond Non-covalent interaction
Why does WHIP co-IP with ubiquitin? WHIP is ubiquitinated WHIP
is a ubiquitin-binding protein WHIP UbUb UbUb UbUb UbUb UbUb UbUb
UbUb Covalent bond Non-covalent interaction
Slide 93
Rad18-like Zn + finger AAA+ ATPase WHIP domain architecture
Does not contain any recognizable Ubiquitin binding domain
Slide 94
WHIPs Zinc finger domain is necessary for ubiquitin binding
mono-Ub in vivo - WT D37A T294A lysate, blot = anti-FLAG lysate,
blot = anti-actin IP = anti-FLAG blot = anti-Ub 200 100 75 50 33 25
15 in vitro 10 15 20 25 37 50 Ub 7 Ub 6 Ub 5 Ub 4 Ub 2 Ub 3 WHIP
UBZ RAD18 UBA BeadsInput Rad18_ZF = UBZ ubiquitin binding
domain
Slide 95
UBZ Domain-Containing Proteins
Slide 96
Summary of UBZ Domain Binding
Slide 97
Why does WHIP co-IP with ubiquitin? WHIP is ubiquitinated WHIP
is a ubiquitin-binding protein WHIP UbUb UbUb UbUb UbUb UbUb UbUb
UbUb Covalent bond Non-covalent interaction
Slide 98
Rad18-like Zn + finger AAA+ ATPase WHIP domain architecture
Does not contain any recognizable Ubiquitin binding domain
Slide 99
WHIPs Zinc finger domain is necessary for ubiquitin binding
mono-Ub in vivo - WT D37A T294A lysate, blot = anti-FLAG lysate,
blot = anti-actin IP = anti-FLAG blot = anti-Ub 200 100 75 50 33 25
15 in vitro 10 15 20 25 37 50 Ub 7 Ub 6 Ub 5 Ub 4 Ub 2 Ub 3 WHIP
UBZ RAD18 UBA BeadsInput Rad18_ZF = UBZ ubiquitin binding
domain
Slide 100
UBZ Domain-Containing Proteins
Slide 101
15 11135 Overlap Between Pulldowns using Different UBZ Domains
UBZ-WHIP UBZ- UBZ1
Interaction between UBZ1 and PCNA is increased Following UV
treatment
Slide 108
31 Overlap Between UBZ1 and WHIP UBZ1WHIP 1645
Slide 109
WHIP1 ERCC1 ERCC4 DDB1DDB2 Large T dnaJ RuvBL1 RuvBL2 UBZ1
BLAP75 WRN BLM Ku70 Ku80 DNA-PK PCNA Topo3A RPA1 Ub Overlap Between
UBZ1 and WHIP
Slide 110
Summary -Proteins are part of a highly integrated network -The
UBZ domain mediates: Ubiquitin Binding, Coupled ubiquitination,
subcellular localization, protein::protein interactions -Functional
UBZ domains are found only in proteins involved in DNA replication
and/or repair -UBZ domains are frequently found in concert with PIP
boxes -The UBZ domain acts in concert with other domains to
regulate the formation of Ubiquitin-dependent complexes -New UBZ
domains are being found everyday
Slide 111
Protein Networks Giulia DeSabbata Antonell Piccini Michael P
Myers Fabio Rossi Martina Colombin Acknowledgements: Fabio Rossi
Martina Colombin Rebecca Bish Antonella Piccini Giulia DeSabbata
Sandor Pongor Providing Reagents: Bruce Stillman Masashi
Narita/Scott Lowe Tomohiko Ohta Toshiki Tsurimoto
- 1 12 WHIP UBZ-1 and WHIP are differentially Regulated by UV
damage
Slide 116
UBZ-1 but not WHIP Interacts with PCNA
Slide 117
Interaction between UBZ1 and PCNA is increased Following UV
treatment
Slide 118
antibody bead trypsin digest directly from beads antibody bead
IP Affinity Purification - coIP
Slide 119
31 Overlap Between UBZ1 and WHIP UBZ1WHIP 1645
Slide 120
WHIP1 ERCC1 ERCC4 DDB1DDB2 Large T dnaJ RuvBL1 RuvBL2 UBZ1
BLAP75 WRN BLM Ku70 Ku80 DNA-PK PCNA Topo3A RPA1 Ub Overlap Between
UBZ1 and WHIP
Slide 121
Summary -Proteins are part of a highly integrated network -The
UBZ domain mediates: Ubiquitin Binding, Coupled ubiquitination,
subcellular localization, protein::protein interactions -Functional
UBZ domains are found only in proteins involved in DNA replication
and/or repair -UBZ domains are frequently found in concert with PIP
boxes -The UBZ domain acts in concert with other domains to
regulate the formation of Ubiquitin-dependent complexes -New UBZ
domains are being found everyday
Slide 122
Protein Networks Giulia DeSabbata Antonell Piccini Michael P
Myers Fabio Rossi Martina Colombin Acknowledgements: Fabio Rossi
Martina Colombin Rebecca Bish Antonella Piccini Giulia DeSabbata
Sandor Pongor Providing Reagents: Bruce Stillman Masashi
Narita/Scott Lowe Tomohiko Ohta Toshiki Tsurimoto
Slide 123
Major Types of Proteomics Survey Proteomics: Qualitative or
Quantitative Analysis of the protein component -whole organism,
tissue, cell type, or subcellular compartment -2D gel
electrophoresis ->MS -typically a few 100 proteins
-Multidimensional LC->MS/MS -typically 1000-5000 proteins
Identification of Biomarkers Interactomics: Mapping Protein:Protein
Interactions -Yeast 2-hybrid techniques -high throughput protein
identification by Mass Spectrometry Mapping Post-Translational
Modifications -High Content Mass Spectrometry Key Concept:
Proteomics is the large scale identification of proteins or
peptides