Protein Structure in 10 Points · 2005. 10. 18. · Proline isomerization Cyclophilin catalyzes Pro cis-trans isomerization 20% 99.9% 0.1% 80%. Conformational change - calmodulin

• Globular proteins are compact and densely packed with only few empty spaces (cavities)

• Protein conformation and dynamics are coded in amino acid sequence• A protein in its native conformation is at an energy minimum which results in

spontaneous folding• In soluble globular proteins, hydrophobic groups are predominantly on the inside,

hydrophilic on the outside of the globule• Most backbone NH and C=O groups are involved in H-bonds to other protein atoms • Homology of sequence (>25-30% identity) similarity of structure• There are clear statistical preferences for some amino acids in some positions in

some secondary structures (but secondary structure prediction from aa-sequence can be wrong…)

• Loops and turns tend to be on the surface of a globular protein• Proteins are dynamic molecules capable of a wide range of motions • Protein structure is dominated by secondary structure

Protein Structure in 10 Points

Stability, Folding and Kinetics

Levinthal’s Paradox:

There are so many unfolded states that if the polypeptide chain had to search through them all in order to find the correct (minimum energy) folded state it would take longer than the age of the Universe

The Paradox ReferenceLevinthal, C. (1968). Are There Pathways for Protein Folding? J. Chim. Phys.

PCB 65, 44-45

Conformational Energy

50 100 150 200 250 30050

100

150200250300

60

61

62

63

64

65

Interactions:

Bonds, bond angles, torsions, electrostatic, van der Waals, (hydrogen bonds)

The free energy, including entropy, rules!!

Free Energy, Entropy and all that


Proteins are only marginally stable (∆G~10kcal/mol), and may denature if the temperature is raised above normal by a few ºC

∆G

The Oildrop Model of Protein Folding

Hydrophobic sidechains thus tend to be buriedinside soluble proteins – but what happens topolar groups in the backbone?


Barnase - one major pathway


Lysozyme - two different pathways.

There are two domains

Disulfide bridge formationBPTI

Proline isomerizationCyclophilin catalyzes Pro cis-trans isomerization

20%

0.1%99.9%

80%

Conformational change -calmodulin

Understanding protein folding via free-energy surfaces fromtheory and experiment

Aaron R. Dinner, Andrej Sali, Lorna J. Smith, Christopher M. Dobson and Martin Karplus TIBS 25 – JULY 2000

RG measures the size of the protein Ramachandran diagram for Alanine as ’energy’contour map

Model of Protein FoldingThe Model:

•a simple lattice on which the polymer is built

•favorable (native) contacts and unfavorable (non-native contacts)

The number of states can be enumerated, and the global free energy (F) minimum identified

Q0 number of native contacts

C total number of contacts

Model Properties – Energetic and Entropic Components

Same native structure in both cases, but in (b,d) the native contacts are not as strong

Fast and Slow Folding Pathways ofLysozyme

Hen lysozyme has 129 aa-residues in two domains

Sequence of events mapped out using NMR hydrogen exchange protection experiments

Fast and Slow Folding Pathways ofModel

This 125-mer lattice model shows very similar behavior as the experimental lysozyme

Core and surface contacts are monitored

Modeling a 3D Structure

• It is sometimes very difficult to obtain an experimental structure.

• Can one construct theoretical 3D models?• Today - Not really, if just based on aa sequence• Homology modeling, or comparative modeling,

works fairly well, but requires access to known structure of a similar protein

Bioinformatics - Concepts• Identity - Homology (“of common origin”)

• Distance Similarity• Score/Scoring matrix/z-score• Global vs Local Alignment• Multiple alignment• Dynamic Programming• Artificial Neural Networks (ANN)

Sequence DatabasesInternational Nucleotide Sequence Database Collaboration:• GenBank at NIH• EMBL• DDBJ (DNA DataBank of Japan)

GenBank doubles in size every 14 months!!

3 000 000 000 bases from 47 000 species (late 1999)

NCBI (National Center for Biotechnology and Information) at NIH:

http://www.ncbi.nlm.nih.gov/

Example protein sequence in FASTA format:>4LZM:_ LYSOZYME (E.C.3.2.1.17) (HIGH SALT) - CHAIN _ MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAKSELDKAIGRNCNGVITK DEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRCALINMVFQMGETGVAGFTNSLRM LQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYKNL

Identity? Similarity?Two identical sequences are easy to recognize, but tospot a relationship when they begin to differ gets progressivelymore difficult. Sequences may also be of different length - we havegaps due to insertions or deletions (indels).

s: ACACACAt: ACACACA

s: AGCACACAt: ACACACTA

s: AGCACAC-At: A-CACACTA

s: AG-CACACAt: ACACACT-A

or

HomologyThe biological approach:

During evolution DNA sequences (and proteins) diverge,due to point mutations and more sophisticated events.Two sequences which share a common evolutionary originare said to be homologous, and our task then is to find these relationships even for distantly releated sequences.

We may thus use our knowledge about evolution andmutations in our quest for homology.

ScoringDNA Protein

s:CUUCCGAAA s:Leu-Pro-Lyst:CUAGCGAGA t:Leu-Ala-Arg

We need to consider the “degree of change” in a substitution, we need a scoring scheme:

Substitution (score) matrix - there are 210 (20x19/2+20) pairs of amino acids and we need a number, a score, indicating howsimilar we consider two amino acids to be. For a given alignmentof two sequence we sum the pairwise scores and add gap penaltiesto obtain to total score for the alignment; sometimes instead the distance between two sequences is considered.

Specific Algorithms

• Global Alignment - find optimal alignment of two complete sequences: Needleman-Wunsch

• Local Alignment - find optimal alignment of fragments of a sequence: Smith-Waterman

• Heuristic (“less stringent”, but faster):FASTA and BLAST (Basic Local Alignment Search Tool) use local high scoring regions to find initial alignments which can be extended

Sequence AlignmentOverall SCP and SFCP are 80% similar

Multiple sequence alignment helps identify the real similarities

Structure ComparisonMeasure of structural similarity:• Root Mean Square Distance (RMSD) between equivalent, superposed atoms

• Caveats: alignment - use sequence and/or secondary structure, and superposition

• Indels also pose a problem

• Distances between Cα atom-pairsin one structure can be comparedto same distances in second structure

( )∑ =−=

N

insorientatio

iiN

RMSD1

221 )()(1min rr

High sequence similarity also gives

high structural similarity

Protein Folds

How many folds are there? (There are ~30000 human genes)

Current estimates 1 000-10 000We know about 100 different folds (1%-10%), almost exclusively of water soluble proteins - a handful membrane protein structures are known, even though they may account for about 1/3 of all proteins.

There are no reliable methods today of predicting a 3D structure just from an amio-acid sequence. ”Guessing” a fold based on the known structure of a homologous protein is the best we can do -homology modeling, works at >25-30% similarity

Protein Data Bankhttp://www.rcsb.org

Protein Data Bank

PDBSum http://www.biochem.ucl.ac.uk/bsm/pdbsum/

Folds in the PDB

New folds

Old folds

CATH http://www.biochem.ucl.ac.uk/bsm/cath/

ClassArchitectureTopologyHomologous Superfamily

SCOP http://scop.mrc-lmb.cam.ac.uk/scop/

Structural Classification Of Proteins

SCOP

DALI http://www.ebi.ac.uk/dali/Structure comparison server

Homology Modeling• Problem: Have sequence of protein and want 3D structure,

but no experimental structure is available.• Find homologous protein(s) with known 3D structure

(>25% similarity recommended!)• Align sequences (multiple alignment HELPS!)

it also helps if you have multiple templates and can use them to identify structurally conserved regions

• Identify conserved and variable regions• Generate core coordinates from template(s)• Generate conformations for loops• Build side-chain conformations• Refine and evaluate the model

Loops - from databases

Restricted set of CDR3 main chain conformations

Success rate

Swiss-Modelhttp://swissmodel.expasy.org/

Submit your own sequence, and get a 3D model back (if there are templates available…)

Alphavirus Spike-NC Binding

SFCP Model vs. Xtal

Conserved Residues in Hydrophobic Binding Pocket

Structure Validation

• Biochemically reasonable• Good stereochemistry, with main chain in

acceptable Ramachandran regions• Planar peptide bonds• Hydrogen bonding of buried residues• Apolar and polar residues properly

accommodated

Exceptions do existB1 fold has been changed to protein like Rop by changing 50% of the amino acids

1994 a 1000$ prize for changing fold by changing no more than 50% of aa

1997 Lynne Regan, Yale, won the prize

B1 domain of protein G

Rop dimer

“JANUS”

56 aa – 28 may change!

B1 and Rop have only three identical aa positions

Change key amino acids:

e g Rop Arg 16 & Asp45 form a salt bridge

No structure for Janus yet, but CD and NMR spectra indicate clear similarities to Rop, including dimer formation

Designing Protein/Peptide• May be easier to find aa-sequence which adopts specific

fold, than the opposite, i e to find the fold of a given sequence

• Zinc-finger peptide design:allow only certain types of aa in give regionscore Ala,Val,Leu,Ile,Phe,Tyr,Trpsurface Ala,Ser,Thr,His,Asp,Asn,Glu,Gln,Lys,Argboundary allow both core and surface setsno Pro,Cys,Met; Gly at special positions

try combinations of these, and evaluate their energy in computer

Designed peptideCan one design a peptide with a zinc-finger fold, without the zinc?

Real Zn-finger Designed – hydrophobic

stabilization

ProfilesA profile is a compilation of additional information about a sequence - it can even take into account 3D information if a 3D structure is known.

Such profiles can be used to evaluate relationships between proteins or protein families, even for distantly related proteins with little sequence similarity

ThreadingThis asks the inverse question of protein folding:

Given a 3D structure, which amino acid sequences are compatible with the structure?

Thread your sequence through representative set of folds and see if there is a match.

Easier, but not easy… we cannot always tell if a sequence fits a structure - our scoring or energy functions are not accurate enough

Structural GenomicsSimilar Sequence Similar Structure

The Protein Universewith protein “families”

RMSD

?

Known structures

Structural Genomics

Structural GenomicsWhich proteins?

Lab 2 goals

• Measure conformational details (distances, angles) in a protein

• Ramachandran plots• Extract information from coordinate file header• Investigate disulfide bonds• RasMol scripts

Documents

Protein Structure in 10 Points · 2005. 10. 18. · Proline isomerization Cyclophilin catalyzes Pro cis-trans isomerization 20% 99.9% 0.1% 80%. Conformational change - calmodulin