Download pptx - DNAGenomics RNAGenomics/Transcriptomics ProteinProteomics MetabolitesMetabolomics The Central Dogma-omics

DNAGenomics RNAGenomics/Transcriptomics ProteinProteomics MetabolitesMetabolomics The Central Dogma-omics

Protein Machines Key Concept: Biochemical functions are carried out by multi-protein machines The polyAdenylation Machinery The Proteosome Key Concept: A Protein Function can be inferred by its binding partners Key Concept: Knowledge of a Machines components is required to understand how it works and how it is regulated

Key Concept: Highly Clustered areas typically serve the same biological function. Protein Machines Interaction with each other in Higher order Networks

Key Concept: Complex phenotypes can be understood in a network context Understanding the Network May Give Insights into Emergent Behaviors -Homeostasis -Robustness -Periodicity -Morphogenesis -Tumorigenesis

Proteins Are Organized in a Small World Network Key Concept: The proteome is HIGHLY Networked

The Small World Hypothesis: Six Degrees of Separation Stanley Milgram study in 1967 -put ads in newspapers in Nebraska and Kansas asking for volunteers for an experiment. The volunteers were asked to contact a divinity student in Boston by going through people that they new on a first name basis who would then contact their friends and so on. -the number of people (degrees) between the volunteers and the target ranged between 2 and 10 with the mean being 6.

Properties of Small World Networks: -highly clustered: my friends are also friends -most nodes are not connected: most people are strangers -presence of hubs (nodes with a lot of connections): Facebook Whales -can find a short path between any two nodes. Two strangers meet and realize they know some of the same person This path is often referred to as the degree of separation -network should be resistant to pertub- ation: Life goes on

Clustered vs Non-Clustered

Number of Links (k) Number of nodes with k links Distribution of Connections 80/20 Law

Ten best Centers 1. CLU11.843 2. CDC331.867 3. TIF21.875 4. MDH11.898 5. SRP11.912 6. YBL004W1.914 7. RPT31.914 8. HAS11.914 9. YGR090W1.917 10. PFK11.918 Ten Worst Centers CAC23.803 PSR13.838 RAM23.840 RAM13.840 ORC23.863 UBA33.902 MAK103.975 YNL056W4.003 YNR046W4.089 VPS44.433 Median Degree of Separation : 2.38 Shortest and longest Pathways

Is S. cerevisae Robust??? -Environmentally Robust -Robust to temperature (4-40 C) -Robust to Nutrient Sources -Robust to Starvation -Robust to Osmolarity (0-1 M NaCl) -Is it Robust to Genetic Perturbation (mutation)??? -S. cerevisiae Genome Deletion Project has deleted 95% of all S. cereviae genes -18.7% of genes are essential -in a typical small world network you can lose ~20% of all nodes before the network crashes.

Ten best Centers 1. CLU11.843 2. CDC331.867 3. TIF21.875 4. MDH11.898 5. SRP11.912 6. YBL004W1.914 7. RPT31.914 8. HAS11.914 9. YGR090W1.917 10. PFK11.918 Ten Worst Centers CAC23.803 PSR13.838 RAM23.840 RAM13.840 ORC23.863 UBA33.902 MAK103.975 YNL056W4.003 YNR046W4.089 VPS44.433 Median Degree of Separation : 2.38 Is there any biology behind the network hypothesis? Essential ORF deletions are only available as heterozygous diploids, while non-essential ORF deletions are available as haploids, homozygous diploids and heterozygous diploids.

Ten best Centers 1. CLU11.843 2. CDC331.867 3. TIF21.875 4. MDH11.898 5. SRP11.912 6. YBL004W1.914 7. RPT31.914 8. HAS11.914 9. YGR090W1.917 10. PFK11.918 Ten Worst Centers CAC23.803 PSR13.838 RAM23.840 RAM13.840 ORC23.863 UBA33.902 MAK103.975 YNL056W4.003 YNR046W4.089 VPS44.433 Median Degree of Separation #: 2.38 Is there any biology behind the network hypothesis? Key Concept: Connectivity and essentiality are correlated. Essential ORF deletions are only available as heterozygous diploids, while non-essential ORF deletions are available as haploids, homozygous diploids and heterozygous diploids.

Evolutionary Effects of Connectedness -Connected genes are non randomly distributed in the genome -Connected genes are less likely to undergo duplication -Connected genes are less likely to have close homologs -Connected genes are less likely to have introns

Evolutionary Effects of Connectedness

Is S. cerevisae Robust??? -Environmentally Robust -Robust to temperature (4-40 C) -Robust to Nutrient Sources -Robust to Starvation -Robust to Osmolarity (0-1 M NaCl) -Is it Robust to Genetic Perturbation (mutation)??? -S. cerevisiae Genome Deletion Project has deleted 95% of all S. cereviae genes -18.7% of genes are essential Is Cancer a Robust Network -Environmentally Robust -It Lives under a constant state of genomic stress

Summary -Proteins are organized in functional units (machines) -these machines do virtually all the work in the cell -understanding the components of a machine is critical for functionally annotating the genome -understanding the components of a machine is critical for determining how a machine is regulated -the effects of mutation are great at this level -Protein Machines are organized into higher order Networks -the Network architecture has left its imprint on evolution -the Network is likely to be rewired under pathological pathological conditions -especially in the case of cancer -understanding the Network is important for understanding the complex behavior of the system Key Concept: High Throughput mapping of protein:protein interactions will provide important insights into human biology

Understanding the Network Requires a lot of Information -Direction of Information -Sign -Magnitude -Timing

Approaches for Mapping Protein:Protein Interactions -Mapping by Inference: -if two proteins interact in one organism than they interact in other organisms. -can be extended to domains/motifs as well -if two proteins are coregulated on microarrays they are likely to interact -Direct Mapping: -In vitro binding experiment -Genetic Screen/Trap -Yeast 2-hybrid assay -Affinity Co-purifications -IP:Western blot -IP:Mass Spectrometry

Interactomics by Genetic Screens Key Concept: Genetic Complementation allows the identification of direct (binary) interactions. Uetz et al 2001

Interactomics by Genetic Screens Key Concept: No matter how good something isthere are always problems. Advantages of Genetic Complementation: -can do genome scale screening -quick -cheap -adaptable -works best when the screen is based on selection Problems of Genetic Complementation: -sensitive to dynamic range -protein interaction may be incompatible with the complementation scheme -can not perturb the system -more false positives than true positives

Affinity Governs the formation of Protein Complexes Affinity is Determined by the shapes of the proteins and how well they fit together. -hydrophobic interactions -ionic interactions -hydrogen bonding Affinity is usually expressed as K d which is the [ ] that results in equivalent [ ] and [ ]. Implicitly, there is usually a mixture of free and complexed components and this ratio is [] dependent. + K d = [ ] x [ ] []

Affinity Governs the formation of Protein Complexes + A weak interaction may only form if the concentration is high enough. +

Interactomics by Co-purification Key Concept: Interacting proteins will co-purify Tap Tagging: Rigaut et al. Nat. Biotech. 1999.

Interactomics by Co-purification Advantages of Co-purification: -proteins isolated from their native source -the system can be perturbed Problems of Co-purification: -sensitive to dynamic range -real interactions may be lost during purification -can be difficult to purify the target protein -no amplification -need a way to identify the co-purifying proteins

antibody bead IP

antibody bead trypsin digest direct antibody bead IP Key Concept: Cutting out steps is one of the hallmarks of high through put approaches. This increases the through put and usually also increases the sensitivity.

antibody bead trypsin digest directly from beads antibody bead IP Affinity Purification - coIP

Native Antibody is Resistant to Trypsin

Reduced/Denatured Antibody is Sensitive to Trypsin

antibody bead IP Key Concept: Complex mixtures can not be manually interpreted. The average protein generates ~50 proteolytic fragments..so you will have 1000s and 1000s to interpret. NS

Sources of Non-Specific Binding -Not enough washing. -Biofluids have a high dynamic range so you must wash away the super abundant stuff to see the less concentrated proteins -Proteins that stick to the beads -Proteins that stick to the antibodies on the beads -Proteins that stick to the wall of the tube -Proteins that stick to your complex of interest -Proteins that are real binders but are biologically irrelevant

Nonspecific Binding is Reproducible

Are All Protein Complexes Biologically Relevant? An interaction will be selected for if it is beneficial. An interaction will be selected against if it is detrimental. What happens if the interaction is neither beneficial nor detrimental? What would be the cost of allowing only beneficial interactions? +

Protein ID by Mass Spectrometry

MultiDimensional Chromatography (MuDPIT)

10-100 Proteins (6 hours) 100-300 Proteins (2 hours) 1000-6000 Proteins (10 hours) Comparison of Three Analysis Techniques on Lysates

53 Proteins (6 hours) 76 Proteins (2 hours) 82 Proteins (10 hours) Comparison of Three Analysis Techniques on IPs

Protein ID by Mass Spectrometry ~10,000 MS/MS per hour Key Concept: LC-MS/MS workflows can not be manually interpreted

acquired spectrum theoretical spectrum (y/b ions) 100% 0% 1 0 x Spectra matched

matched peaks (y/b ions) 100% 0% spectrum intensities predicted? (1,0) Compute a Correlation Score

spectrum intensities predicted? (1,0) The Truth about Spectral Matching -Spectral matching produces an answer for every spectra, even those that are artifacts. -Experimental spectra always deviate from theoretical spectra. -A high correlation score is not a guarantee that it is correct. -Peptide must be in the database in order to be found.

Peptide ID by Mass Spectrometry

Peptide IDs can be clustered into Protein IDs

Mapping Peptides to Proteins is NOT easy!

Single Proteins to Protein Lists

How do you know which matches to trust???

spectrum intensities predicted? (1,0) The Truth about Spectral Matching -Spectral matching produces an answer for every spectra, even those that are clearly artifacts. -Experimental spectra always deviate from theoretical spectra. -A high correlation score is not a guarantee that it is correct. -Peptide must be in the database in order to be found. -An e value is easily calculated using the ~11,000 incorrect peptides. -The false discovery rate is easily calculated using a decoy database Key Concept: Statistics are required for the proper interpretation of MS/MS data.

hyperscore # results incorrect IDs Histogram of Correlation Scores Highest scoring match is assumed to be correct

hyperscore # results log(# results) significant Significant scores

hyperscore # results log(# results) E-value=e -8.2 Estimating E-values

hyperscore log(# results) E-value=10 -8.2 Interpreting E-values E-value is the number of matches you expect to find at random, given the search parameters. Or the chance of getting a match this good from this spectrum by random chance. So this would be a chance of 10 -8.2 or 1 in ~150,000,000.

hyperscore log(# results) E-value=10 -8.2 Interpreting E-values v E-value=10 -3.9 Changing the search parameters will change the statistics. Allowing many post-translational modifications and/or amino acid substitutions, using a very large database, allowing large mass errors, etc. Can use a decoy or false database to verify the statistical models being used.

Use a Decoy Database to Determine False Discovery Rate A good decoy database: -does not contain any correct hits -is the same size as the Query database -has the same distribution of amino acids -has the same size distribution of proteolytic fragments (peptides) -can be reproduced by other labs Reversed databases solve most of these constraints -so the protein RSAMPLER digested with trypsin gives: -SAMPLER (forward) -RELPMASR ELPMASR (reverse) -using the reverse typically only gives problems with palindromic sequences.which thankfully are rare*. *except in viruses!

Use a Decoy Database to Determine False Discovery Rate By definition: Everything from the Reversed database is incorrect!

Use a Decoy Database to Determine Global False Discovery Rates < 0.5% < 1% < 1.5% < 2.0% < 2.3% < 2.6%

Forward Reverse Number of spectra in each bin Calculating a local False Discovery Rate 0 -1 -1.5 -3.0 -4.0

Predicting phosphorylation sites (low confidence)(high confidence) Incorrect n = 51 S118 n = 6 S153 n = 54

Using FDR to choose the best Database

Amino Acid Substitutions can be modeled using Statistics

Allowing for Amino Acid Substitutions 117 vs 178 proteins

So now what do I do?

Interpreting Protein Lists Comparison based on Gene Ontology (GO) ControlExperiment

Comparison based on Gene Ontology (GO) ControlExperiment Interpreting Protein Lists

Functional Clustering of Protein Lists

Domain Enrichment

Enrichment of Components from known Pathways

Build Networks

http://string-db.org/

Other Resources DAVID: http://david.abcc.ncifcrf.gov/ GO based enrichment. Clustering of redundant GO terms. Mapping onto KEGG pathways. Mapping onto disease pathways, etc. Biogrid: thebiogrid.org An Online Interaction Respository With Data Compiled Through Comprehensive Curation Efforts. MIPS: http://mips.helmholtz-muenchen.de/proj/ppi/ Manually curated protein protein interaction database

Summary Proteomics: -large scale analysis of proteins (really peptides) -statistical analysis is required for interpretation -can be used to address a wide range of biological problems -best used to answer discrete questions -things that can not be answered by genomic techniques -protein complexes -protein modification -other post translational events -change is subcellular localization -question will help determine which hits are chosen for validation Feel free to email me questions: [email protected]

Ubiquitin: Short protein that is covalently attached to other proteins 7 lysine residues all can form poly-Ub chains K48 chains involved in proteosmal degradation K63 chains involved in signaling K11 chains ???? K6 chains (DNA damage) K29 chains ????

K6-Ub Pulldown HA ? K6-Ub HA ? ? ? Tagged ubiquitin with only one available lysine. Pull down K6-linked poly-Ub chain. Identify proteins. K6 chains are assembled by BRCA1

K6-Ubiquitin Pulldown

Hit Criteria Not in the control sample Not a commonly known contaminant Good score (more than one peptide) - Expressed as an False Discovery Rate Seen in repeat experiments

4959 4079 Potential Hits Ub and Ub- binding proteins Excluded: heat shock hnRNP ribosomal keratin histones Proteins also found in the control IP Results from a representative non-denaturing K6-ubiquitin IP

Ubiquitin-binding proteins potential hits good scores high confidence poor scores low confidence

20 22233 Overlap Between K6 and K63 Pulldowns UB-K6UB-K63

Werners helicase interacting protein 1 (WHIP1) good scores high confidence poor scores low confidence

Rad18- like Zn + finger AAA+ ATPase WHIP domain architecture Does not contain any recognizable Ubiquitin binding domain

Why does WHIP co-IP with ubiquitin? WHIP is ubiquitinated WHIP is a ubiquitin-binding protein WHIP UbUb UbUb UbUb UbUb UbUb UbUb UbUb Covalent bond Non-covalent interaction

Ub 6 Ub 4 Ub 5 Ub 3 Ub 2 Ub 1 Ub 6 Ub 5 Ub 4 Ub 3 Ub 2 Ub 1 I.P from bacterial lysate with anti-FLAG beads (Sigma) W.B. anti-ubiquitin (6C1) 1:1000, secondary = anti-mouse TrueBlot, exposure = 10 sec. mono Ub K48K63 mono Ub K48K63K48K63 inputFLAG-BAPFLAG-WHIP

Co-IP of WHIP with various poly-Ub chains in vivo IP = -HA (ubiquitin) WB = -FLAG (whip) IPs from doubly-transfected 293 cells FLAG-Whip + - + + + + + + HA-Ub - + K6 K11 K29 K48 K63 - 250 kD 100 75 50 37 150 -FLAG IP

WHIP is Ubiquitinated 250 150 100 75 WHIP-FLAG-MAT - + Ni-NTA pulldown in 8 M urea from 293T cells, W.B. = anti-FLAG (M2) 1:5000

WHIP Ubiquitinylation Mass spectrometry PEPTIDE (aa) SEQUENCE MODIFICATIO N E-VALUE 254-274SLLETNEIPSLILWGPPGCGK 274 K(114.1)6.8e-007 292-310FVTLSATNAKTNDVRDVIK 301 K(114.1)3.7e-004 292-306FVTLSATNAKTNDVR 301 K(114.1)1.4e-004 302-316TNDVRDVIKQAQNEK 310 K(114.1)3.7e-005 311-321QAQNEKSFFKR 316 K(114.1)2.6e-003 322-332KTILFIDEIHR 322 K(114.1)7.0e-007 333-346FNKSQQVNAALLSR 335 K(114.1)5.5e-012 449-462VLITENDVKEGLQR 457 K(114.1)2.7e-010 Ubyquitinylated residue SUMOylated residue

Rad18-like Zn + finger AAA+ ATPase WHIP domain architecture Does not contain any recognizable Ubiquitin binding domain

WHIPs Zinc finger domain is necessary for ubiquitin binding mono-Ub in vivo - WT D37A T294A lysate, blot = anti-FLAG lysate, blot = anti-actin IP = anti-FLAG blot = anti-Ub 200 100 75 50 33 25 15 in vitro 10 15 20 25 37 50 Ub 7 Ub 6 Ub 5 Ub 4 Ub 2 Ub 3 WHIP UBZ RAD18 UBA BeadsInput Rad18_ZF = UBZ ubiquitin binding domain

UBZ Domain-Containing Proteins

Summary of UBZ Domain Binding

Rad18-like Zn + finger AAA+ ATPase WHIP domain architecture Does not contain any recognizable Ubiquitin binding domain

WHIPs Zinc finger domain is necessary for ubiquitin binding mono-Ub in vivo - WT D37A T294A lysate, blot = anti-FLAG lysate, blot = anti-actin IP = anti-FLAG blot = anti-Ub 200 100 75 50 33 25 15 in vitro 10 15 20 25 37 50 Ub 7 Ub 6 Ub 5 Ub 4 Ub 2 Ub 3 WHIP UBZ RAD18 UBA BeadsInput Rad18_ZF = UBZ ubiquitin binding domain

UBZ Domain-Containing Proteins

15 11135 Overlap Between Pulldowns using Different UBZ Domains UBZ-WHIP UBZ- UBZ1

WHIP-EGFP WHIPD37A-EGFP WHIP-EGFP EGFP UBZ1-EGFPUBZ1D473A-EGFP UBZ Domain Regulates Subcellular Localization

HA- UBZ1 - WT D473A UBZ Domain Regulates Coupled Ubiquitination

- 1 12 WHIP UBZ-1 and WHIP are differentially Regulated by UV damage

WHIP Ubiquitinylation Mass spectrometry PEPTIDE (aa) SEQUENCE MODIFICATIO N E-VALUE 254-274SLLETNEIPSLILWGPPGCGK 274 K(114.1)6.8e-007 292-310FVTLSATNAKTNDVRDVIK 301 K(114.1)3.7e-004 292-306FVTLSATNAKTNDVR 301 K(114.1)1.4e-004 302-316TNDVRDVIKQAQNEK 310 K(114.1)3.7e-005 311-321QAQNEKSFFKR 316 K(114.1)2.6e-003 322-332KTILFIDEIHR 322 K(114.1)7.0e-007 333-346FNKSQQVNAALLSR 335 K(114.1)5.5e-012 449-462VLITENDVKEGLQR 457 K(114.1)2.7e-010 Ubyquitinylated residue SUMOylated residue

UBZ-1 but not WHIP Interacts with PCNA

Interaction between UBZ1 and PCNA is increased Following UV treatment

31 Overlap Between UBZ1 and WHIP UBZ1WHIP 1645

WHIP1 ERCC1 ERCC4 DDB1DDB2 Large T dnaJ RuvBL1 RuvBL2 UBZ1 BLAP75 WRN BLM Ku70 Ku80 DNA-PK PCNA Topo3A RPA1 Ub Overlap Between UBZ1 and WHIP

Summary -Proteins are part of a highly integrated network -The UBZ domain mediates: Ubiquitin Binding, Coupled ubiquitination, subcellular localization, protein::protein interactions -Functional UBZ domains are found only in proteins involved in DNA replication and/or repair -UBZ domains are frequently found in concert with PIP boxes -The UBZ domain acts in concert with other domains to regulate the formation of Ubiquitin-dependent complexes -New UBZ domains are being found everyday

Protein Networks Giulia DeSabbata Antonell Piccini Michael P Myers Fabio Rossi Martina Colombin Acknowledgements: Fabio Rossi Martina Colombin Rebecca Bish Antonella Piccini Giulia DeSabbata Sandor Pongor Providing Reagents: Bruce Stillman Masashi Narita/Scott Lowe Tomohiko Ohta Toshiki Tsurimoto

HA- UBZ1 - WT D473A UBZ Domain Regulates Coupled Ubiquitination

15 11135 Overlap Between Pulldowns using Different UBZ Domains UBZ-WHIP UBZ- UBZ1

WHIP-EGFP WHIPD37A-EGFP WHIP-EGFP EGFP UBZ1-EGFPUBZ1D473A-EGFP UBZ Domain Regulates Subcellular Localization

- 1 12 WHIP UBZ-1 and WHIP are differentially Regulated by UV damage

UBZ-1 but not WHIP Interacts with PCNA

Interaction between UBZ1 and PCNA is increased Following UV treatment

antibody bead trypsin digest directly from beads antibody bead IP Affinity Purification - coIP

31 Overlap Between UBZ1 and WHIP UBZ1WHIP 1645

WHIP1 ERCC1 ERCC4 DDB1DDB2 Large T dnaJ RuvBL1 RuvBL2 UBZ1 BLAP75 WRN BLM Ku70 Ku80 DNA-PK PCNA Topo3A RPA1 Ub Overlap Between UBZ1 and WHIP

Summary -Proteins are part of a highly integrated network -The UBZ domain mediates: Ubiquitin Binding, Coupled ubiquitination, subcellular localization, protein::protein interactions -Functional UBZ domains are found only in proteins involved in DNA replication and/or repair -UBZ domains are frequently found in concert with PIP boxes -The UBZ domain acts in concert with other domains to regulate the formation of Ubiquitin-dependent complexes -New UBZ domains are being found everyday

Protein Networks Giulia DeSabbata Antonell Piccini Michael P Myers Fabio Rossi Martina Colombin Acknowledgements: Fabio Rossi Martina Colombin Rebecca Bish Antonella Piccini Giulia DeSabbata Sandor Pongor Providing Reagents: Bruce Stillman Masashi Narita/Scott Lowe Tomohiko Ohta Toshiki Tsurimoto

Major Types of Proteomics Survey Proteomics: Qualitative or Quantitative Analysis of the protein component -whole organism, tissue, cell type, or subcellular compartment -2D gel electrophoresis ->MS -typically a few 100 proteins -Multidimensional LC->MS/MS -typically 1000-5000 proteins Identification of Biomarkers Interactomics: Mapping Protein:Protein Interactions -Yeast 2-hybrid techniques -high throughput protein identification by Mass Spectrometry Mapping Post-Translational Modifications -High Content Mass Spectrometry Key Concept: Proteomics is the large scale identification of proteins or peptides