View
219
Download
0
Embed Size (px)
Citation preview
Fa 05 CSE182
CSE182-L6
Protein structure basicsProtein sequencing
Fa 05 CSE182
Announcements
• Midterm 1: Nov 1, in class.• Assignment 2: Online, due October 20.
Fa 05 CSE182
Distinguishing between families
Fa 05 CSE182
Distinguishing between families
Assignment 2
Fa 05 CSE182
Profiles
• Start with an alignment of strings of length m, over an alphabet A,
• Build an |A| X m matrix F=(fki)
• Each entry fki represents the frequency of symbol k in position i
0.71
0.14
0.14
0.28
Fa 05 CSE182
Scoring Profiles
€
S(i, j) = fkik
∑ M rk,s j[ ]
k
i
s
fki
Scoring Matrix
Fa 05 CSE182
Psi-BLAST idea
• Multiple alignments are important for capturing remote homology.
• Profile based scores are a natural way to handle this.
• Q: What if the query is a single sequence.• A: Iterate:
– Find homologs using Blast on query– Discard very similar homologs– Align, make a profile, search with profile.
Fa 05 CSE182
Psi-BLAST speed
• Two time consuming steps.1. Multiple alignment of homologs2. Searching with Profiles.
1. Does the keyword search idea work?
• Multiple alignment:– Use ungapped multiple
alignments only
• Pigeonhole principle again: – If profile of length m must score >= T– Then, a sub-profile of length l must
score >= lT|/m– Generate all l-mers that score at least
lT|/M– Search using an automaton
Fa 05 CSE182
Protein Domains• An important realization (in the last decade) is that proteins have a
modular architecture of domains/folds.• Example: The zinc finger domain is a DNA-binding domain.• What is a domain?
– Part of a sequence that can fold independently, and is present in other sequences as well
Fa 05 CSE182
Domain review
• What is a domain?• How are domains expressed
– Motifs (Regular expression & others)– Multiple alignments– Profiles– Profile HMMs
Fa 05 CSE182
Domain databases
Can you speed up HMM search?
Fa 05 CSE182
A structural view of proteins
Fa 05 CSE182
CS view of a protein
• >sp|P00974|BPT1_BOVIN Pancreatic trypsin inhibitor precursor (Basic protease inhibitor) (BPI) (BPTI) (Aprotinin) - Bos taurus (Bovine).
• MKMSRLCLSVALLVLLGTLAASTPGCDTSNQAKAQRPDFCLEPPYTGPCKARIIRYFYNAKAGLCQTFVYGGCRAKRNNFKSAEDCMRTCGGAIGPWENL
Fa 05 CSE182
Protein structure basics
Fa 05 CSE182
Side chains determine amino-acid type
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
• The residues may have different properties.• Aspartic acid (D), and Glutamic Acid (E) are acidic
residues
Fa 05 CSE182
Bond angles form structural constraints
Fa 05 CSE182
Various constraints determine 3d structure
• Constraints– Structural constraints due to physiochemical
properties– Constraints due to bond angles– H-bond formation
• Surprisingly, a few conformations are seen over and over again.
Fa 05 CSE182
Alpha-helix
• 3.6 residues per turn• H-bonds between 1st
and 4th residue stabilize the structure.
• First discovered by Linus Pauling
Fa 05 CSE182
Beta-sheet
• Each strand by itself has 2 residues per turn, and is not stable.• Adjacent strands hydrogen-bond to form stable beta-sheets, parallel or anti-parallel.• Beta sheets have long range interactions that stabilize the structure, while alpha-helices have local
interactions.
Fa 05 CSE182
Domains
• The basic structures (helix, strand, loop) combine to form complex 3D structures.
• Certain combinations are popular. Many sequences, but only a few folds
Fa 05 CSE182
3D structure
• Predicting tertiary structure is an important problem in Bioinformatics.
• Premise: Clues to structure can be found in the sequence.• While de novo tertiary structure prediction is hard, there are
many intermediate, and tractable goals.• The PDB database is a compendium of structures
PDB
Fa 05 CSE182
Searching structure databases
• Threading, and other 3d Alignments can be used to align structures.
• Database filtering is possible through geometric hashing.
Fa 05 CSE182
Trivia Quiz
• What research won the Nobel prize in Chemistry in 2004?
• In 2002?
Fa 05 CSE182
How are Proteins Sequenced? Mass Spec 101:
Fa 05 CSE182
Nobel Citation 2002
Fa 05 CSE182
Nobel Citation, 2002
Fa 05 CSE182
Mass Spectrometry
Fa 05 CSE182
Sample Preparation
Enzymatic Digestion (Trypsin)
+Fractionation
Fa 05 CSE182
Single Stage MS
MassSpectrometry
LC-MS: 1 MS spectrum / second
Fa 05 CSE182
Tandem MS
Secondary Fragmentation
Ionized parent peptide