96
Prof Shoba Ranganathan Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Dept. of Chemistry and Biomolecular Sciences, Sciences, Macquarie University, Sydney, Australia & Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School of Dept of Biochemistry, Yong Loo Lin School of Medicine Medicine National University of Singapore National University of Singapore ([email protected]) ([email protected]) Biomolecular Modeling: Biomolecular Modeling: building a 3D protein building a 3D protein structure from its structure from its sequence sequence

Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Embed Size (px)

Citation preview

Page 1: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Prof Shoba RanganathanProf Shoba Ranganathan

Dept. of Chemistry and Biomolecular Sciences, Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia &Macquarie University, Sydney, Australia &

Dept of Biochemistry, Yong Loo Lin School of MedicineDept of Biochemistry, Yong Loo Lin School of MedicineNational University of SingaporeNational University of Singapore

([email protected])([email protected])

Biomolecular Modeling:Biomolecular Modeling: building a 3D protein building a 3D protein

structure from its sequencestructure from its sequence

Page 2: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Why protein structure?Why protein structure? In the factory of the living cell, proteins are the

workers, performing a variety of tasks

Each protein adopts a particular folding pattern that determines its functionThe 3D structure of a protein brings

into close proximity residues that are far apart in the amino acid sequence

Page 3: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

How does a protein fold?How does a protein fold? Most newly synthesized proteins fold

without assistance! Ribonuclease A: denatured protein

could refold and recover its activity (C. Anfinsen -1966)“Structure implies function”

The amino acid sequence encodes the protein’s structural information

Page 4: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

1. Understanding Protein Structure

2. A Quick Overview of Sequence Analysis

3. Finding a Structural Homologue

4. Template Selection

5. Aligning the Query Sequence to Template Structure(s)

6. Building the Model

Page 5: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

The basicsThe basics Proteins are linear heteropolymers: one or more

polypeptide chains Repeat units: 20 amino acid residues

Range from a few 10s-1000s Three-dimensional shapes (“folds”)

adopted vary enormously Experimental methods: X-ray

crystallography, electron microscopy and NMR (nuclear magnetic resonance)

Page 6: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

The (L-)amino acidThe (L-)amino acid

N

R

C

O

O

C

+

-

Amino

Carboxylate

Side chain = H,CH3,…

Backbone

Page 7: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

The peptide bondThe peptide bond

Page 8: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Coplanar atomsCoplanar atoms

Page 9: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Levels of protein structureLevels of protein structure

Zeroth: amino acid composition Primary

This is simply the order of covalent linkages along the polypeptide chain, i.e. the sequence itself

Page 10: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Levels of protein structureLevels of protein structure

Secondary Local organization of the protein backbone: -

helix, -strand (which assemble into -sheets), turn and interconnecting loop

Page 11: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Ramachandran / phi-psi plotRamachandran / phi-psi plot

-helix (right

handed)

-sheet

-helix (left handed)

Page 12: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Levels of protein structureLevels of protein structure

Tertiary packing of secondary

structure elements into a compact spatial unit

“Fold” or domain – this is the level to which structure prediction is currently possible

Page 13: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Levels of protein structureLevels of protein structure

Quaternary Assembly of homo- or

heteromeric protein chains

Usually the functional unit of a protein, especially for enzymes

Page 14: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Structural classesStructural classes

All- (helical) All- (sheet)

Page 15: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

(parallel -sheet)

Structural classesStructural classes

(antiparallel -sheet)

Page 16: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Structural informationStructural information

Protein Data Bank: maintained by the Research Collaboratory for Structural Bioinformatics http://www.rcsb.org/pdb > 45,744 structures of proteins Also contains structures of DNA,

carbohydrates, protein-DNA complexes and numerous small ligand molecules.

Page 17: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

The PDB dataThe PDB data

Text files Each entry is identified by a unique 4-

letter code: say 1emg 1emg entry

Header information Atomic coordinates in Å (1 Ångstrom

= 1.0e-10 m)

Page 18: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

PDB Header detailsPDB Header details identifies the molecule, any modifications, date of

release of PDB entry

organism, keywords, method Authors, reference, resolution if X-ray structure Sequence, x-reference to sequence databases

HEADER GREENFLUORESCENT PROTEIN 12-NOV-98 1EMG TITLE GREEN FLUORESCENT PROTEIN (65-67 REPLACED BY CRO, S65T TITLE 2 SUBSTITUTION, Q80R) COMPND MOL_ID: 1; COMPND 2 MOLECULE: GREEN FLUORESCENT PROTEIN; COMPND 3 CHAIN: A; COMPND 4 ENGINEERED: YES; COMPND 5 MUTATION: 65 - 67 REPLACED BY CRO, S65T SUBSTITUTION, Q80R COMPND 6 SUBSTITUTION; COMPND 7 BIOLOGICAL_UNIT: MONOMER

Page 19: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

The data itselfThe data itself

ATOM 1 N SER A 2 29.089 9.397 51.904 1.00 81.75 ATOM 2 CA SER A 2 27.883 10.162 52.185 1.00 79.71ATOM 3 C SER A 2 26.659 9.634 51.463 1.00 82.64 ATOM 4 O SER A 2 26.718 8.686 50.686 1.00 81.02 ATOM 5 CB SER A 2 28.039 11.660 51.932 1.00 75.59ATOM 6 OG SER A 2 27.582 12.038 50.639 1.00 43.28-------ATOM 1737 CD1 ILE A 229 39.535 21.584 52.346 1.00 41.62TER 1738 ILE A 229

Coordinates for each heavy (non-hydrogen) atom from the first residue to the last

Any ligands (starting with HETATM) follow the biomacromolecule

O of water molecules (also HETATM) at the end

Page 20: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Structural FamiliesStructural Families SCOP - Structural Classification Of

Proteins http://scop.mrc-lmb.cam.ac.uk/scop

FSSP – Family of Structurally Similar Proteins http://www.ebi.ac.uk/dali/fssp/

CATH – Class, Architecture, Topology, Homology http://www.biochem.ucl.ac.uk/bsm/cath

Page 21: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Structure comparison factsStructure comparison facts

Proteins adopt a limited number of topologies. Homologous sequences show very

similar structures, with strong conservation in secondary structural elements: variations in non-conserved regions.

In the absence of sequence homology, some folds are preferred by vastly different sequences.

Page 22: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Structure comparison factsStructure comparison facts The “active site” (a collection of functionally

critical residues) is remarkably conserved, even when the protein fold is different. Structural models (especially those based

on homology) provide insights into possible function for new proteins.

Implications for protein engineering ligand/drug design, function assignment of genomic data.

Page 23: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Visualizing PDB informationVisualizing PDB information RASMOL: most popular, available for all platforms (Sayle et al, 2005) http://www.bernstein-plus-sons.com/software/rasmol

DeepView Swiss-PDBViewer: from Swiss-Prot (Guex & Peitsch, 1997) http://tw.expasy.org/spdbv/

Chemscape Chime Plug-in: for PC and Mac http://www.mdli.com/products/framework/chemscape

PyMOL: Very good, available for all platforms (DeLano, W.L. The PyMOL Molecular Graphics System, 2002) http://pymol.sourceforge.net

Page 24: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

RASMOL views - SH2 domain RASMOL views - SH2 domain

All-atom model Space-filling model

Atom colors: N O C S

Page 25: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

C Trace Ribbon

Rainbow coloring: N to C Coloring: by structural units

RASMOL views – 1sha RASMOL views – 1sha

Page 26: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Homologous foldsHomologous folds Hemoglobin and

erythrocruorin: 31% sequence identity

Page 27: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Analogous foldsAnalogous folds Hemoglobin and

phycocyanin: 9% sequence identity

Page 28: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Surface PropertiesSurface PropertiesCro repressor –

DNA complex Basic residues

in blue Acidic residues

in red

Page 29: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Mapping Functional RegionsMapping Functional Regions

Immunoglobulin light chain - dimer

Hydrophobhic residues in magenta

Hydrophilic and charged residues in cyan

Page 30: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

1. Understanding Protein Structure

2. A Quick Overview of Sequence Analysis

3. Finding a Structural Homologue

4. Template Selection

5. Aligning the Query Sequence to Template Structure(s)

6. Building the Model

Page 31: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Siblings and CousinsSiblings and Cousins Siblings or homologues: sequences with at least

30% sequence identity over an alignment length of at least 125 residues and conservation of function.

Cousins or paralogues: < 30% identity but with conservation of function

Both show structural conservation Homologues located using a database search tool

such as BLAST (free webserver): http://www.ncbi.nlm.nih.gov/BLAST

Paralogues require a more sensitive method such as PSI-BLAST

Page 32: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Multiple Sequence AlignmentMultiple Sequence AlignmentFinding the best way to match the residues ofrelated sequences Identical residues must be lined up The rest should be arranged, based on

observed substitution in protein families chemical similarity charge similarity

Where it is impossible to get the residues to line up, the biological concept of insertion/deletion in invoked: the ‘gap’ in alignments

Page 33: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

MSA MethodsMSA Methods CLUSTALW / CLUSTALX (Thompson et al, 1997):

freely available for all platforms and one of the best alignment programs

http://www-igbmc.u-strasbg.fr/BioInfo/ClustalX/Top.html

MAXHOM (Sander & Schneider, 1991): alignment based on maximum homology; available via the PredictProtein webserver, free for academics

http://cubic.bioc.columbia.edu/predictprotein/

MALIGN (Johnson et al, 1994): freely available UNIX program, based on the structural alignment of protein families

http://www.abo.fi/fak/mnf/bkf/research/johnson/software.html

Page 34: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Alignment ChecksAlignment Checks Conservation of functionally important residues:

e.g. the catalytic triad (Asp-Ser-His) that are essential for serine proteinase activity

Line up of structurally important residues: e.g. cysteines forming disulfide bonds

Overall, maximizing the alignment of “like” residues

Completely conserved residues usually indicate some conserved structural or functional role, especially buried charges

Page 35: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Sequence Motifs & PatternsSequence Motifs & Patterns From the analysis of the alignment of

protein families Conserved sequence features, usually

associated with a specific function PROSITE (Hulo et al, 2006) database for

protein “signature” patterns: http://www.expasy.ch/prosite

Page 36: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Aligned Sequence FamiliesAligned Sequence Families From alignments of homologous

sequences: PRINTS PRODOM:

http://www.toulouse.inra.fr/prodom.html

From Hidden Markov Model based methods: PFAM: http://www.sanger.ac.uk/Pfam

Page 37: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Protein DomainsProtein Domains Most proteins are composed of structural subunits

called domains A domain is a compact unit of protein structure,

usually associated with a function. It is usually a “fold” - in the case of monomeric

soluble proteins. A domain comprises normally only one protein

chain: rare examples involving 2 chains are known.

Domains can be shared between different proteins: like a LEGO block

Page 38: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Protein ArchitecturesProtein Architectures Beads-on-a-string: sequential location: tyrosine-

protein kinase receptor TIE-1 (immunoglobulin, EGF, fibronectin type-3 and protein kinase).

Domain insertions: “plugged-in” - pyruvate kinase (1pyk)

SMART: smart.embl-heidelberg.deSimple Modular Architecture Retrieval Tool

Page 39: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Dissection into DomainsDissection into Domains A sequence, usually > 125 residues should

be routinely checked to see how many domains are present.

Conserved Domain Architecture Retrieval Tool (CDART) uses information in Pfam and SMART to assign domains along a sequence

E.g. NP_002917 shows similarity to G-protein regulators:

Page 40: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

1. Understanding Protein Structure

2. A Quick Overview of Sequence Analysis

3. Finding a Structural Homologue

4. Template Selection

5. Aligning the Query Sequence to Template Structure(s)

6. Building the Model

Page 41: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Structural HomologuesStructural Homologues

BLASTP vs. PDB database or PSI-BLAST: look for 4-character PDB ID E < 0.005

Domain coverage: at least 60% coverage is recommended

Gaps: we don’t want them. Choose between: few gaps and reasonable similarity scores or lots of gaps and high similarity scores?

Page 42: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Small Proteins: Disulfide bondsSmall Proteins: Disulfide bonds BLAST-type methods may not locate

homologues, if Conserved Domain search is not turned on.

Are the Cys residues conserved? Gaps: where are they on the structure?

gnl|Pfam|pfam00095, wap, WAP-type (Whey Acidic Protein) four-disulfide core'.

CD-Length = 46 residues, 100.0% aligned Score = 43.9 bits (102), Expect = 1e-06

Q:49 KAGFCPWNLLQMISSTGPCPMKIECSSDRECSGNMKCCNVDCVMTCTPP 97 D: 1 KPGVCPWVSISE---AGQCLELNPCQSDEECPGNKKCCPGSCGMSCLTP46

Page 43: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Metal-binding domainsMetal-binding domains

C2H2 Zinc Finger 2 Cys & 2 His binding to

Zinc Not detected even by CD-

search in BLAST Detected by Pfam &

SMART Sequence Pattern:#-X-C-X(1-5)-C-X3-#-X5-#-

X2-H-X(3-6)-[H/C]

Page 44: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Structure Prediction MethodsStructure Prediction Methods Secondary Structure Prediction: identify local

structural elements such as helices, strands and loops.

> 75% accuracy achievable PredictProtein or PHD

http://cubic.bioc.columbia.edu/pp/ PSIPRED

http://bioinf.cs.ucl.ac.uk/psipred/ SSPro

http://promoter.ics.uci.edu/BRNN-PRED/

Page 45: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Folds from Secondary Folds from Secondary Structure PredictionsStructure Predictions

Assembling SSEs into folds is a combinatorial problem

Current methods depend on available structural data for mapping predictions: FORREST

http://abs.cit.nih.gov/foresst/foresst.html TOPITS from the PHD server

http://cubic.bioc.columbia.edu/pp

Page 46: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Tertiary Structure PredictionTertiary Structure Prediction Fold recognition/Threading: < 20% identity

typically Best results obtained by combining several

database search and knowledge-based tools: 3D-PSSM

http://www.sbg.bio.ic.ac.uk/~3dpssm/ FUGUE

http://www-cryst.bioc.cam.ac.uk/fugue/

Page 47: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

1. Understanding Protein Structure

2. A Quick Overview of Sequence Analysis

3. Finding a Structural Homologue

4. Template Selection

5. Aligning the Query Sequence to Template Structure(s)

6. Building the Model

Page 48: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

One or many templates?One or many templates? Sequence similarity: extract template

sequences and align with query: select the most similar structure

Completeness: Missing data? REMARK 465 MISSING RESIDUES REMARK 465 THE FOLLOWING RESIDUES WERE NOT LOCATED IN THE REMARK 465 EXPERIMENT. (M=MODEL NUMBER; RES=RESIDUE NAME; C=CHAIN REMARK 465 IDENTIFIER; SSSEQ=SEQUENCE NUMBER; I=INSERTION CODE.) REMARK 465 REMARK 465 M RES C SSSEQI REMARK 465 MET A 1 REMARK 465 THR A 230

REMARK 470 M RES CSSEQI ATOMS REMARK 470 GLU A 5 OE2 REMARK 470 GLU A 6 CG CD OE1 OE2 REMARK 470 GLU A 17 OE1

Page 49: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

One or many templates?One or many templates? X-ray or NMR?:

Lowest resolution X-ray structure X-ray and then NMR NMR average over assembly

One or many?: Structure alignment of C atoms If 2 templates are very close, keep only one Keep templates that provide new information

Page 50: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Many templatesMany templates Sequence alignment from structure

comparison of templates (SSA) can be different from a simple sequence alignment (SA).

For model building, 1. align templates structurally

2. extract the corresponding SSA

Page 51: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

1. Understanding Protein Structure

2. A Quick Overview of Sequence Analysis

3. Finding a Structural Homologue

4. Template Selection

5. Aligning the Query Sequence to the Template Structure(s)

6. Building the Model

Page 52: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Query - Template AlignmentQuery - Template Alignment >40% identity: any alignment method is OK Below this, checks are essential.

Collect close sequence homologues (about 10) and align to query to get MSA (multiple sequence alignment)

Collect several structural templates (at least 5) and align them using structure comparison methods: extract the SSA (structural sequence alignment)

Align MSA to SSA using profile alignment Extract query and selected template(s) from the

final alignment – QTA.

Page 53: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

QTA ChecksQTA Checks Residue conservation checks

Functional regions Patterns/motifs conserved?

Indels Combine gaps separated by few residues

Editing the alignment Move gaps from secondary structures to

loops Within loops, move gaps to loop ends, i.e.

turnaround point of backbone

Page 54: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

QTA ChecksQTA Checks Residue conservation checks

Functional regions Patterns/motifs conserved?

Indels Combine gaps separated by few residues

Editing the alignment Move gaps from secondary structures to

loops Within loops, move gaps to loop ends, i.e.

turnaround point of backbone

Page 55: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Visual Inspection of IndelsVisual Inspection of Indels 2-residue

deletion from sequence alignment

End-of-loop 2-residue deletion

Page 56: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

1. Understanding Protein Structure

2. A Quick Overview of Sequence Analysis

3. Finding a Structural Homologue

4. Template Selection

5. Aligning the Query Sequence to Template Structure(s)

6. Building the Model

Page 57: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Input for Model BuildingInput for Model Building

Query sequence Template structure

Template sequence Query-template sequence alignment

Page 58: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Methods AvailableMethods Available

1. WHATIF (Vriend G, 1990) : High quality models where template is

available Indels not modelled

Side chain rotamers In silico mutations In silico disulfide bond creation

Page 59: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Methods AvailableMethods Available

2. SWISS-MODEL (Schwede et al, 2003) : Automatic modeling mode with multiple

templates Query + template input High Homology situations DeepView for input file creation

Page 60: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Methods AvailableMethods Available3. MODELLER (Sali & Blundell, 1993) :

High quality models Sequence alignment Structure analysis/alignment Multiple templates Multiple chains Ligand/cofactor present

4. ESyPred3D (uses MODELLER): QTAs from several methods & neural networks http://www.fundp.ac.be/urbm/bioinfo/esypred/

Page 61: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Methods AvailableMethods Available5. ICM (Ruben et al, 1994) :

High quality models Loop modelling

Multiple templates not possible Sequence/Structure alignment/analysis Ab initio peptide modeling Secondary structure prediction

6. Geno3D (Combet et al, 2002) : Automated modelling Distance geometry used for loops http://geno

Page 62: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Methods AvailableMethods Available

7. 3D-JIGSAW (Bates et al, 2001) : Automatic modeling mode Interactive user mode to select templates Multiple templates Multidomain protein modeling

Page 63: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Methods AvailableMethods Available

8. CPH-MODELS (Lund et al, 1997) : Fully automated FASTA search for templates Not validated

Page 64: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Automatic or Manual Mode?Automatic or Manual Mode? Automatic: High homology

Manual Medium/Low homology Template from structure prediction Multiple templates Multiple chains Ligand present

Page 65: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

How good is the model?How good is the model?

Structural Quality Analysis

PROCHECK (Laskowski et al, 1993) :

WHATIF (Vriend G, 1990) : ERRAT (Colovos & Yeates, 1993) :

Page 66: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Improving ill-defined regionsImproving ill-defined regions

Iterative model building Rebuild or anneal bad regions Check/edit alignment and rebuild

Molecular dynamics and/or Monte Carlo simulations Compute intensive Input files need to be set up Optional

Page 67: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Molecular Modeling ProtocolMolecular Modeling Protocol Resources required

The query sequence   Personal computer with internet

connectivity RASMOL/DeepView for PDB structure

visualization CLUSTALX sequence alignment software Access to a UNIX workstation MODELLER/ICM – UNIX software WHATIF – UNIX/PC software PROCHECK – UNIX or Windows software

Page 68: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

MM Protocol – Input FilesMM Protocol – Input Files

Minimum requirement Query sequence Template structure

Template sequence Query-Template alignment

Page 69: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Ex 1. High Homology CaseEx 1. High Homology Case

Human SOX9 WT - homologous to SRY (PDB: 1HRY) - 49% identity

S9WT: ..AGAACAATGG.. highest SOXCORE: ..GCAACAATCT.. least Mutants (campomelic dysplasia):

F12L: No DNA binding H65Y: Minimal binding P70R: altered specificity; no SOXCORE A19V: near WT but normal binding

Page 70: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

1. SOX9 Models1. SOX9 Models

WT & P70R models built

C overlay:

WT-SRY ~ 0.72 Å J. Biol. Chem. 274

(1999) 24023

SOX9 & P70Rbased on SRY

Page 71: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Ex 1. SOX9 Ex 1. SOX9 + DNA Models+ DNA Models

Observed disease-linked mutations mapped

Other residues in DNA-binding groove determined

SOX9-WT

SOX9-P70R

SRY

Page 72: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Ex 2. Low Homology SituationEx 2. Low Homology Situation Pigments from reef-building corals: similar to

Pocilloporin fluoresce under UV and visible radiation similar to the Green Fluorescent Protein -

GFP (19.6% identity) contain ‘QYG’ instead of ‘SYG’ in GFP, as

proposed fluorophore

Page 73: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

2. Alignment of POC4 & GFP2. Alignment of POC4 & GFP

Page 74: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

2. POC4 Model2. POC4 Model Barrel ends open C-ter not included -sheet OK ‘QYG’ fits the site! 26 residues within 5Å

of QYG (only 19 in GFP)

Increased thermal stability

UV protection

Page 75: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Ex 3. Small Disulfide-bonded Ex 3. Small Disulfide-bonded Protein: Complement Factor HProtein: Complement Factor H

20 tandem homologous units = SCRs (short consensus repeat) or “sushi” regions

Each SCR is ~ 60 aa: conserved Y, P, G 2 disulfide bridges:1-3 & 2-4 Linkers of 3-8 aa

Heparin binding SCRs: 7 (high affinity) & 20

Previous SCRs required for activity: minimum constructs are fH67 and fH18-20

C1

C3

C2C4

N-ter

C-ter

HV loop

Page 76: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

3. Sequence Alignment of Close 3. Sequence Alignment of Close Functional HomologuesFunctional Homologues

hfH 385 C L R K C Y F P Y L E N G Y N Q N H G R K F V Q G K S I D V A

fHR-3 83 C L R K C Y F P Y L E N G Y N Q N Y G R K F V Q G N S T E V A

bfH 292 C L R Q C I F N Y L E N G H N Q H R E E K Y L Q G E T V R V H

mfH 385 C V R K C V F H Y V E N G D S A Y W E K V Y V Q G Q S L K V Q

Consensus * : * : * * * : * * * . . : : * * : : *

hfH 416 C H P G Y A L P K A - Q T T V T C M E N G W S P T P R C I R 444

fHR-3 114 C H P G Y G L P K V R Q T T V T C T E N G W S P T P R C I R 143

bfH 323 C Y E G Y S L Q N D - Q N T M T C T E S G W S P P P R C I R 351

mfH 416 C Y N G Y S L Q N G - Q D T M T C T E N G W S P P P K C I R 444

Consensus * : * * . * : * * : * * * . * * * * . * : * * *

Site A Site d Site B Site c

Page 77: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

3. Templates for fH SCRs 6-73. Templates for fH SCRs 6-7 hfH SCRs 15&16 (fH1516; PDB ID: 1HFH) Vaccinia virus complement control protein

domains 3&4 (vcp34; PDB ID: 1VVC)

Orientations differ considerably Vcp34 28% identical to hfH67 compared to

hfH1516 (25%) !

hfH15 hfH16

vcp3vc

p4

hfH15 hfH16

vcp3

vcp4

Page 78: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

3. Query-Templates alignment3. Query-Templates alignment

Page 79: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

3. hfH67 model3. hfH67 model

hfH67

hfH1516

Sialic acid

Heparin

disaccharide repeat

Page 80: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

3. Locating residues for 3. Locating residues for mutation from modelmutation from model

Arg-404Arg-387

Lys-405 His

-402

Lys-410

Lys-388SCRs 6&7 SCRs 15&16

Pacific Symposium of Biocomputing 2000, 5:155Pacific Symposium of Biocomputing 2000, 5:155

Page 81: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Ex 4: Protein EngineeringEx 4: Protein Engineering Thermolysin-like

protease unstable at high temperatures (> 40 ºC unlike trypsin)

Homology Model built G8 & N60 suited for

disulfide bond Double Mutant

functional at 92.5 ºC

J Mansfeld et al. Extreme Stabilization of a Thermolysin-like Protease by an Engineered Disulfide Bond” J. Biol. Chem. 1997 272: 11152-11156.

Page 82: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

Ex 5. Multiple chains: Human Ex 5. Multiple chains: Human Hand, Foot & Mouth Disease Hand, Foot & Mouth Disease Virus capsidVirus capsid 2000 outbreak of

HFMD in Singapore: thousands of children affected – 4 deaths (The Lancet, 2000, 356, 1338)

Major etiological agent: EV71 (enterovirus group)

Neurological complications

Page 83: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

5. EV71 genome structure5. EV71 genome structureCapsid Replication

VP4 VP2 VP3 VP1 2A 2B 2C 3A,B 3C 3D

5’ AUG 3’ UAGVPg

PolyA

EV71 specific primersPan-enterovirus primers

95/94% (RNA) homology coxsackievirus A16/B3 Only 1% difference between neurovirulent and

non-neuro virulent isolates Most variations in non-capsid regions Within capsid regions, VP1 shows maximum

variability relative to other Evs Differences in capsid region 1: VP1 & VP2, 2: VP3

Page 84: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

5. Picornaviridae5. PicornaviridaeIcosahedralCapsid

Page 85: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

5. Template hunt5. Template hunt

BLASTP against PDB sequences VP1: 3 templates

1BEV 38.7% (bovine enterovirus) 1EAH 36.5% (poliovirus type 2 strain Lansing) 1FPN 38.0% (human rhinovirus serotype 2)

VP2: 1BEV 56.7% VP3: 1BEV 54.9% VP4: 1BEV* 50.0%

Page 86: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

5. Fixing the VP1 Alignment5. Fixing the VP1 Alignment

Structural alignment of templates: using VAST (Gibrat, Madej, & Bryant, 1996)

Extract corresponding sequence alignment

Match HFMDV VP1 to aligned templates using profile alignment in CLUSTALW

Page 87: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

5. VP1 alignment to templates5. VP1 alignment to templates

VP11BEV1 14 Q A A G A L V A G T S T S T H S V A T D S T P A L Q A A E T G A T S T A R D E S M I E T R T I V P T H G I H E T S V E S F F G R S S L V G M1EAH1 24 - - - A N N L P D T Q S S G P A H S - K E T P A L T A V E T G A T N P L V P S D T V Q T R H V I Q K R T R S E S T V E S F F A R G A C V A I1FPN1 15 - - - - L V V P N I N S S N P T T S - N S A P A L D A A E T G H T S S V Q P E D V I E T R Y V Q T S Q T R D E M S L E S F L G R S G C I H EEV711 23 A L P A P T G Q N T Q V S S H R L D T G E V P A L Q A A E I G A S S N T S D E S M I E T R C V L N S H S T A E T T L D S F F S R A G L V G E

. . * . . * * * * . * * : . . . : : * * : . : * : : : * * : . * . . :

1BEV1 84 P L L A T - - - - - - G T S I T H W R I D F R E F V Q L R A K M S W F T Y M R F D V E F T I I A T S S - T G Q N V T T E Q H T T Y Q V M Y V1EAH1 90 I E V D N D - - - - - S K L F S V W K I T Y K D T V Q L R R K L E F F T Y S R F D M E F T F V V T S N Y T D A N N G H A L N Q V Y Q I M Y I1FPN1 80 S K L E V T L A N Y N K E N F T V W A I N L Q E M A Q I R R K F E L F T Y T R F D S E I T L V P C I S A L - - - S Q D I G H I T M Q Y M Y VEV711 93 I D L P L E - G T T N P N G Y A N W D I D I T G Y A Q M R R K V E L F T Y M R F D A E F T F V A C T P - - - - - T G E V V P Q L L Q Y M F V

: : * * . * : * * . . * * * * * * * : * : : * * : :

1BEV1 147 P P G A P V P S N Q D S F Q W Q S G C N P S V F A D T D G P P A Q F S V P F M S S A N A Y S T V Y D G Y A R F M - - - D T - - - D P D R Y G1EAH1 161 P P G A P I P G K W N D Y T W Q T S S N P S V F Y T Y G A P P A R I S V P Y V G I A N A Y S H F Y D G F A K V P L A G Q A S T E G D S L Y G1FPN1 147 P P G A P V P N S R D D Y A W Q S G T N A S V F W Q H G Q A Y P R F S L P F L S V A S A Y Y M F Y D G Y D E - - - - - - - - - - Q D Q N Y GEV711 157 P P G A P K P E S R E S L A W Q T A T N P S V F V K L T D P P A Q V S V P F M S P A S A Y Q W F Y D G Y P T F G - - - E H K Q E K D L E Y G

* * * * * * . : . * * : . * . * * * . . : . * : * : : . * . * * . * * * : * *

1BEV1 211 I L P S N F L G F M Y F R T L E D - - - A A H Q V R F R I Y A K I K H T S C W I P R A P R Q A P Y K K R Y N L V F S - - G - D S D R I C S N1EAH1 231 A A S L N D F G S L A V R V V N D H N P T K L T S K I R V Y M K P K H V R V W C P R P P R A V P Y Y G P - G V D Y K - - D - G L A P - L P G1FPN1 207 T A N T N N M G S L C S R I V T E K H I H K V H I M T R I Y H K A K H V K A W C P R P P R A L E Y T R A H R T N F K I E D R S I Q T A I V TEV711 224 A C P N N M M G T F S V R T V G S S - K S K Y P L V V R I Y M R M K H V R A W I P R P M R N Q N Y L F K A N P N Y A - - G N S I K P T G T S

* : * : * : . * : * : * * . * * * . * * : . .

1BEV1 275 R A S L T S Y1EAH1 296 K - G L T T Y1FPN1 277 R P I I T T AEV711 291 R T A I T T -

: : * :

281301283296

strandsheliceshelices

Pocket-factor binding residues

Page 88: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

5. Model building steps5. Model building steps Build all 4 capsid proteins (VP1-VP4)

together to ensure 3D fit Use 1BEV alone for VP2-VP4 For VP1: use aligned 1BEV, 1EAH,

1FPN Check model

Page 89: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

5. Round 1: 5. Round 1: VP1VP1,,VP2VP2,,VP3VP3,,VP4VP4

Clip hanging ends

Re-position problem loops:

adjust gaps in alignment

Build again

Page 90: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

5. Round 2: Pentamer Check5. Round 2: Pentamer Check

Loops look OK Build pentamer Publish…. Oops: clash in

pentamer assembly. Go back

Page 91: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

5. Close encounters of the 35. Close encounters of the 3rdrd Kind Kind

Build only VP3 pentamer

N-terminus of each VP3 hydrogen-bonded

Also, in BEV, Asp-Lys ion pair

First 25 aa overlay v. well

Page 92: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

5. Fourth foray: build with 5 VP3s5. Fourth foray: build with 5 VP3s

Only first 50aa of the other 4 VP3s included

Model resulted in knots due to insufficient refinement cycles

However, VP3 pentameric region OK

Page 93: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

5. Fifth 5. Fifth and final and final attemptattempt

Page 94: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

5. Canyon Pit and Antigenic Sites5. Canyon Pit and Antigenic Sites

Cardiovirus

Neurovirulent Polio (mouse)

Poliovirussites

Page 95: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

5. Putative antigenic sites5. Putative antigenic sites

VP2VP1

Page 96: Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School

5. HFMDV Conclusions 5. HFMDV Conclusions Unique surface loops identified for

Immunodiagnostic assays Vaccine design Antibodies being generated

Canyon pit: depth is similar to BEV Mapping the antigenic regions of other related enteroviruses

on the HFMDV surface: specific VP1 and VP2 sites buried Sunita Singh, Vincent T. K. Chow, C. L. Poh, M. C. Phoon:

Dept. of Microbiology, NUS Applied Bioinformatics, Vol 1, issue 1, 43-52: invited research

article