Fa 05CSE182 CSE182-L6 Protein structure basics Protein sequencing

Preview:

Citation preview

Fa 05 CSE182

CSE182-L6

Protein structure basicsProtein sequencing

Fa 05 CSE182

Announcements

• Midterm 1: Nov 1, in class.• Assignment 2: Online, due October 20.

Fa 05 CSE182

Distinguishing between families

Fa 05 CSE182

Distinguishing between families

Assignment 2

Fa 05 CSE182

Profiles

• Start with an alignment of strings of length m, over an alphabet A,

• Build an |A| X m matrix F=(fki)

• Each entry fki represents the frequency of symbol k in position i

0.71

0.14

0.14

0.28

Fa 05 CSE182

Scoring Profiles

S(i, j) = fkik

∑ M rk,s j[ ]

k

i

s

fki

Scoring Matrix

Fa 05 CSE182

Psi-BLAST idea

• Multiple alignments are important for capturing remote homology.

• Profile based scores are a natural way to handle this.

• Q: What if the query is a single sequence.• A: Iterate:

– Find homologs using Blast on query– Discard very similar homologs– Align, make a profile, search with profile.

Fa 05 CSE182

Psi-BLAST speed

• Two time consuming steps.1. Multiple alignment of homologs2. Searching with Profiles.

1. Does the keyword search idea work?

• Multiple alignment:– Use ungapped multiple

alignments only

• Pigeonhole principle again: – If profile of length m must score >= T– Then, a sub-profile of length l must

score >= lT|/m– Generate all l-mers that score at least

lT|/M– Search using an automaton

Fa 05 CSE182

Protein Domains• An important realization (in the last decade) is that proteins have a

modular architecture of domains/folds.• Example: The zinc finger domain is a DNA-binding domain.• What is a domain?

– Part of a sequence that can fold independently, and is present in other sequences as well

Fa 05 CSE182

Domain review

• What is a domain?• How are domains expressed

– Motifs (Regular expression & others)– Multiple alignments– Profiles– Profile HMMs

Fa 05 CSE182

Domain databases

Can you speed up HMM search?

Fa 05 CSE182

A structural view of proteins

Fa 05 CSE182

CS view of a protein

• >sp|P00974|BPT1_BOVIN Pancreatic trypsin inhibitor precursor (Basic protease inhibitor) (BPI) (BPTI) (Aprotinin) - Bos taurus (Bovine).

• MKMSRLCLSVALLVLLGTLAASTPGCDTSNQAKAQRPDFCLEPPYTGPCKARIIRYFYNAKAGLCQTFVYGGCRAKRNNFKSAEDCMRTCGGAIGPWENL

Fa 05 CSE182

Protein structure basics

Fa 05 CSE182

Side chains determine amino-acid type

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

• The residues may have different properties.• Aspartic acid (D), and Glutamic Acid (E) are acidic

residues

Fa 05 CSE182

Bond angles form structural constraints

Fa 05 CSE182

Various constraints determine 3d structure

• Constraints– Structural constraints due to physiochemical

properties– Constraints due to bond angles– H-bond formation

• Surprisingly, a few conformations are seen over and over again.

Fa 05 CSE182

Alpha-helix

• 3.6 residues per turn• H-bonds between 1st

and 4th residue stabilize the structure.

• First discovered by Linus Pauling

Fa 05 CSE182

Beta-sheet

• Each strand by itself has 2 residues per turn, and is not stable.• Adjacent strands hydrogen-bond to form stable beta-sheets, parallel or anti-parallel.• Beta sheets have long range interactions that stabilize the structure, while alpha-helices have local

interactions.

Fa 05 CSE182

Domains

• The basic structures (helix, strand, loop) combine to form complex 3D structures.

• Certain combinations are popular. Many sequences, but only a few folds

Fa 05 CSE182

3D structure

• Predicting tertiary structure is an important problem in Bioinformatics.

• Premise: Clues to structure can be found in the sequence.• While de novo tertiary structure prediction is hard, there are

many intermediate, and tractable goals.• The PDB database is a compendium of structures

PDB

Fa 05 CSE182

Searching structure databases

• Threading, and other 3d Alignments can be used to align structures.

• Database filtering is possible through geometric hashing.

Fa 05 CSE182

Trivia Quiz

• What research won the Nobel prize in Chemistry in 2004?

• In 2002?

Fa 05 CSE182

How are Proteins Sequenced? Mass Spec 101:

Fa 05 CSE182

Nobel Citation 2002

Fa 05 CSE182

Nobel Citation, 2002

Fa 05 CSE182

Mass Spectrometry

Fa 05 CSE182

Sample Preparation

Enzymatic Digestion (Trypsin)

+Fractionation

Fa 05 CSE182

Single Stage MS

MassSpectrometry

LC-MS: 1 MS spectrum / second

Fa 05 CSE182

Tandem MS

Secondary Fragmentation

Ionized parent peptide