58
Bioinformatics and Computational Molecular Biology Geoff Barton http://www.compbio.dundee.ac.uk

Bioinformatics and Computational Molecular Biology Geoff Barton

Embed Size (px)

Citation preview

Page 1: Bioinformatics and Computational Molecular Biology Geoff Barton

Bioinformatics and Computational Molecular Biology

Geoff Barton

http://www.compbio.dundee.ac.uk

Page 2: Bioinformatics and Computational Molecular Biology Geoff Barton

Practical Tutorial

• Dr David Martin practical tutorial on the use of pymol molecular graphics software.

• In this lecture I will show lots of protein structures – use www.ebi.ac.uk/msd to find them, and/or scop domains database (find with google).

Page 3: Bioinformatics and Computational Molecular Biology Geoff Barton

Similarities in Proteins

• Lecture 1– Overview of data in molecular biology– Protein modelling– Similarities of Protein Sequence, Structure,

Function

Page 4: Bioinformatics and Computational Molecular Biology Geoff Barton

Introduction to Sequence Comparison

• Lecture 2:– Why compare sequences?– Methods for sequence comparison/alignment.– Multiple alignment– Database searching - FASTA/BLAST– Iterative searching - PSI-BLAST

Page 5: Bioinformatics and Computational Molecular Biology Geoff Barton

Practical/WWW references

• Organised by Drs Martin– Good preparation would be to look at:

http://www.ebi.ac.uk/Tools andhttp://www.ncbi.nlm.nih.gov

– Look at BLAST and FASTA on these sites as well as database access facilities.

Page 6: Bioinformatics and Computational Molecular Biology Geoff Barton

Private DataPast Experiments.Lab note books.

Group discussions.

Traditional biological research

AnalysisReading. Talking.

Thinking.

Hypothesis!

ExperimentDesign. Execution.

Publish!

Public DataJournals

Conferences

Page 7: Bioinformatics and Computational Molecular Biology Geoff Barton

Private DataPast Experiments.Lab note books.

Group discussions.DNA sequences

Protein SequencesGenetic mapsTranscripts

3D structuresproteomics results

SNP dataetcetcetc

Bioinformatics/Computational Biology and biological research Analysis

Reading. Talking.Thinking.

ComputationalAnalysis

Software Development

Hypothesis!Computer aided.

ExperimentDesign. Execution.

Computational experimentsSimulation

Publish!Database submission

Database management

Public DataJournals

ConferencesDNA sequences

Protein SequencesGenetic mapsTranscripts

3D structuresproteomics results

SNP dataetcetcetc

Page 8: Bioinformatics and Computational Molecular Biology Geoff Barton

EMBL Nucleotide Sequence Database Growth (to 2nd Oct 2006)

Taken from: www.ebi.ac.uk

Page 9: Bioinformatics and Computational Molecular Biology Geoff Barton

Protein Sequences

Approx 3,500,000 known for all species (Oct. 2006.)

25,000 for Human

(not counting splice variants and post-translational modifications)

Page 10: Bioinformatics and Computational Molecular Biology Geoff Barton

Protein 3D Structures

Approx 39,000 known(much duplication)

Page 11: Bioinformatics and Computational Molecular Biology Geoff Barton

Biological data in context

Page 12: Bioinformatics and Computational Molecular Biology Geoff Barton

DNA

RNA

Protein Sequence

Protein 3D structure

Molecular function

Overview of Biological Hierarchy...

Whole organismanimal, plant, etc.

Tissue/organbrain, heart, lungs

blood, ...

Ecosystemmany different organisms

Populationgroup of the same type of organism

Familygroup with known common lineage

Cellnerve,muscle,etc..

Organellenucleus, mitochondria, etc...

Nucleus

Chromosome

Gene

MolecularLevels

Page 13: Bioinformatics and Computational Molecular Biology Geoff Barton

DNA

RNA

Protein Sequence

Protein 3D structure

Molecular function

Whole organismanimal, plant, etc.

Tissue/organbrain, heart, lungs

blood, ...

Ecosystemmany different organisms

Populationgroup of the same type of organism

Familygroup with known common lineage

Cellnerve,muscle,etc..

Organellenucleus, mitochondria, etc...

Nucleus

Chromosome

Gene

Expression Data(Transcriptomics)

Which of the genes are switched on in which cells/tissues and when?

What are the effects of drugs anddisease on expression patterns

DNA ‘CHIP’ TECHNOLOGY

Technology and data in biology

Page 14: Bioinformatics and Computational Molecular Biology Geoff Barton

DNA

RNA

Protein Sequence

Protein 3D structure

Molecular function

Whole organismanimal, plant, etc.

Tissue/organbrain, heart, lungs

blood, ...

Ecosystemmany different organisms

Populationgroup of the same type of organism

Familygroup with known common lineage

Cellnerve,muscle,etc..

Organellenucleus, mitochondria, etc...

Nucleus

Chromosome

Gene

Protein Expression Data(Proteomics)

Which proteins arebeing produced in which cells/tissues when? Which modified forms are present?

What are the effects of drugs and disease on these patterns

2D Gels + Mass Spectrometry.

Technology and data in biology

Page 15: Bioinformatics and Computational Molecular Biology Geoff Barton

DNA

RNA

Protein Sequence

Protein 3D structure

Molecular function

Whole organismanimal, plant, etc.

Tissue/organbrain, heart, lungs

blood, ...

Ecosystemmany different organisms

Populationgroup of the same type of organism

Familygroup with known common lineage

Cellnerve,muscle,etc..

Organellenucleus, mitochondria, etc...

Nucleus

Chromosome

Gene

Protein 3D Structure - the bridge to chemistry(Structural Genomics)

What is the atomic level structure of the protein?

What other molecules does it interact with?

What small molecules - potential drugs - does it interact with?

What are the effects of point mutations on the structure?

X-ray crystallography, NMR spectroscopy, single particle, cryo-electron microscopy.

Technology and data in biology

Page 16: Bioinformatics and Computational Molecular Biology Geoff Barton

Whole organismanimal, plant, etc.

Tissue/organbrain, heart, lungs

blood, ...

Ecosystemmany different organisms

Populationgroup of the same type of organism

Familygroup with known common lineage

Cellnerve,muscle,etc..

Organellenucleus, mitochondria, etc...

Nucleus

Chromosome

Gene

DNA

RNA

Protein Sequence

Protein 3D structure

Molecular function

Overview of Biological Hierarchy...

Macroscopic Levels

Page 17: Bioinformatics and Computational Molecular Biology Geoff Barton

Biology is now a data intensive science

To do good science, you need to know how to use (and not abuse)

computational tools.

Page 18: Bioinformatics and Computational Molecular Biology Geoff Barton

Protein Structure Prediction

• ‘Homology’ modelling– Relies on the fact that similarity of sequence

implies similarity of 3D structure.

Page 19: Bioinformatics and Computational Molecular Biology Geoff Barton

Lysozyme (1lz1) -lactalbumin (1alc)

?

Imagine we don’t know the 3D structure of -lactalbumin, but we do know its amino acid sequence and that of lysozyme

Page 20: Bioinformatics and Computational Molecular Biology Geoff Barton

Lysozyme (1lz1) -lactalbumin (1alc)

37.7% Identity, Z=17.6

?

Page 21: Bioinformatics and Computational Molecular Biology Geoff Barton

Protein structure prediction(Homology Modelling)

• Align sequence of protein of unknown structure to sequence of protein of known structure.

• In ‘conserved core’ of protein, substitute the amino acid types into the known structure.

• Deal with ‘loops’ between the core elements of structure.

Page 22: Bioinformatics and Computational Molecular Biology Geoff Barton

Lysozyme (1lz1) -lactalbumin (1alc)

37.7% Identity, Z=17.6

Page 23: Bioinformatics and Computational Molecular Biology Geoff Barton

Protein structure prediction(Homology modelling)

• Problems:– Need protein of known structure that is similar

in sequence.– Building loops where there are deletions.– Verifying model.

• Key is getting a good alignment in the first place– Bad alignment => bad model.

Page 24: Bioinformatics and Computational Molecular Biology Geoff Barton

Good alignment on its own can:

• Identify key residues (absolutely conserved)

• Identify likely protein core (conserved hydrophobic residues)

• Help predict protein secondary structure (not this lecture).

Page 25: Bioinformatics and Computational Molecular Biology Geoff Barton

Sequence alignment is a fundamental technique in

molecular biology.

• May predict proteins of common function even when no 3D structure is known.

• May be used to predict 3D structure and so help understanding of mutants.

• Some examples of where this is right and wrong...

Page 26: Bioinformatics and Computational Molecular Biology Geoff Barton

Prediction of structure and function by similarity to known

sequences and structures

Assumption is that similar sequence implies similar structureand function.

But what do we mean by “similar”?

Does similarity of sequence really imply similarity of function?

Page 27: Bioinformatics and Computational Molecular Biology Geoff Barton

Protein Sequence/Structure/Function Network

Sequence 3D Structure Function

Similar Similar Similar

Different Different Different

Page 28: Bioinformatics and Computational Molecular Biology Geoff Barton

Protein Sequence/Structure/Function Network

Sequence 3D Structure Function

Similar Similar Similar

Different Different Different

Page 29: Bioinformatics and Computational Molecular Biology Geoff Barton

Similar Sequence, Similar Structure, Similar Function.

e.g. Trypsin-like Serine Proteinases

Same fold, same catalytic mechanism.

But DIFFERENT specificity.

e.g. Immunoglobulin variable domains.

Same fold, similar binding function.

But DIFFERENT specificity.

True of all examples. Similarities only give clues to function, differences in specificity can be regarded as differences of function.

Page 30: Bioinformatics and Computational Molecular Biology Geoff Barton

ImmunoglobulinVariable Domains

e.g. see: 1a2y

Page 31: Bioinformatics and Computational Molecular Biology Geoff Barton

Tryptophan at core of Ig variable domain

Page 32: Bioinformatics and Computational Molecular Biology Geoff Barton

Protein Sequence/Structure/Function Network

Sequence 3D Structure Function

Similar Similar Similar

Different Different Different

Page 33: Bioinformatics and Computational Molecular Biology Geoff Barton

Lysozyme (1lz1) -lactalbumin (1alc)

37.7% Identity, Z=17.6

Page 34: Bioinformatics and Computational Molecular Biology Geoff Barton

-crystallin/L-Lactate Dehydrogenase

Page 35: Bioinformatics and Computational Molecular Biology Geoff Barton

Protein Sequence/Structure/Function Network

Sequence 3D Structure Function

Similar Similar Similar

Different Different Different

Page 36: Bioinformatics and Computational Molecular Biology Geoff Barton

Trypsin (3ptn) Subtilisin (2sec)

Page 37: Bioinformatics and Computational Molecular Biology Geoff Barton

Trypsin (3ptn) Subtilisin (2sec)

Page 38: Bioinformatics and Computational Molecular Biology Geoff Barton

Trypsin (3ptn)

Subtilisin (2sec)

His- 57, Asp-102, Ser-195

Asp- 32, His- 64, Ser-221

Page 39: Bioinformatics and Computational Molecular Biology Geoff Barton

Protein Sequence/Structure/Function Network

Sequence 3D Structure Function

Similar Similar Similar

Different Different Different

Page 40: Bioinformatics and Computational Molecular Biology Geoff Barton

Nature 398,84-90, 1999

PDB: 1b47

Page 41: Bioinformatics and Computational Molecular Biology Geoff Barton

11% sequence ID

rmsd 1.47Åover 70 residues

PDB: 1b47

Page 42: Bioinformatics and Computational Molecular Biology Geoff Barton

Protein Sequence/Structure/Function Network

Sequence 3D Structure Function

Similar Similar Similar

Different Different Different

Page 43: Bioinformatics and Computational Molecular Biology Geoff Barton

Russell, R. B. and Barton, G. J. (1993), "An SH2-SH3 Domain hybrid", Nature, 364, 765.

PDB: 1bia PDB: 2ptk

Page 44: Bioinformatics and Computational Molecular Biology Geoff Barton

PDB:2aai PDB:1bas

Page 45: Bioinformatics and Computational Molecular Biology Geoff Barton

Matthews, S., et al. (1994), "The p17 Matrix Protein from HIV-1 is Structurally Similar to Interferon-gamma", Nature, 370, 666-668.

Page 46: Bioinformatics and Computational Molecular Biology Geoff Barton

Protein Sequence/Structure/Function Network

Sequence 3D Structure Function

Similar Similar Similar

Different Different Different

Does this ever happen?

Page 47: Bioinformatics and Computational Molecular Biology Geoff Barton

HIV Reverse Transcriptase (RT)

Page 48: Bioinformatics and Computational Molecular Biology Geoff Barton

HIV Reverse Transcriptase (RT)

Page 49: Bioinformatics and Computational Molecular Biology Geoff Barton

HIV Reverse Transcriptase (RT) - domain linkers

Page 50: Bioinformatics and Computational Molecular Biology Geoff Barton

Protein Sequence and Structural Similarity

Type Similarity Find By

Homologous(scop family)

Similar StructureSimilar SequenceSimilar Function

Pair-wise SequenceComparison(BLAST/FASTA/Smith-Waterman)

‘RemoteHomologue’(scop superfamily)

Similar StructureWeakly Similar SequenceSimilar Function

ProfileIterative Search(e.g. PSI-BLAST)

Threading/fold recognition?

Analogue(scop fold)

Similar StructureNo sequence similarityOften no functionalsimilarity

Solve BOTH structures byX-ray/NMR methods.

Mapping?

Page 51: Bioinformatics and Computational Molecular Biology Geoff Barton

Protein Sequence and Structural Similarity

Type Similarity Find By

Homologous(scop family)

Similar StructureSimilar SequenceSimilar Function

Pair-wise SequenceComparison(BLAST/FASTA/Smith-Waterman)

‘RemoteHomologue’(scop superfamily)

Similar StructureWeakly Similar SequenceSimilar Function

ProfileIterative Search(e.g. PSI-BLAST)

Threading/fold recognition?

Analogue(scop fold)

Similar StructureNo sequence similarityOften no functionalsimilarity

Solve BOTH structures byX-ray/NMR methods.

Mapping?

Page 52: Bioinformatics and Computational Molecular Biology Geoff Barton

Barton, G. J. et al, (1992), "Human Platelet Derived Endothelial Cell Growth Factor is Homologous to E.coli Thymidine Phosphorylase", Prot. Sci., 1, 688-690.

Page 53: Bioinformatics and Computational Molecular Biology Geoff Barton

Protein Sequence and Structural Similarity

Type Similarity Find By

Homologous(scop family)

Similar StructureSimilar SequenceSimilar Function

Pair-wise SequenceComparison(BLAST/FASTA/Smith-Waterman)

‘RemoteHomologue’(scop superfamily)

Similar StructureWeakly Similar SequenceSimilar Function

ProfileIterative Search(e.g. PSI-BLAST)

Threading/fold recognition?

Analogue(scop fold)

Similar StructureNo sequence similarityOften no functionalsimilarity

Solve BOTH structures byX-ray/NMR methods.

Mapping?

Page 54: Bioinformatics and Computational Molecular Biology Geoff Barton

Barton, G. J., Cohen, P. T. C. and Barford, D. (1994),"Conservation Analysis and Structure Prediction of the Protein Serine/Threonine Phosphatases: Sequence Similarity with Diadenosine Tetra-phosphatase fromE. coli Suggests Homology to the Protein Phosphatases", Eur. J. Biochem.,220, 225-237.

Page 55: Bioinformatics and Computational Molecular Biology Geoff Barton

Protein Sequence and Structural Similarity

Type Similarity Find By

Homologous(scop family)

Similar StructureSimilar SequenceSimilar Function

Pair-wise SequenceComparison(BLAST/FASTA/Smith-Waterman)

‘RemoteHomologue’(scop superfamily)

Similar StructureWeakly Similar SequenceSimilar Function

ProfileIterative Search(e.g. PSI-BLAST)

Threading/fold recognition?

Analogue(scop fold)

Similar StructureNo sequence similarityOften no functionalsimilarity

Solve BOTH structures byX-ray/NMR methods.

Mapping?

Page 56: Bioinformatics and Computational Molecular Biology Geoff Barton

Russell, R. B. and Barton, G. J. (1993), "An SH2-SH3 Domain hybrid", Nature, 364, 765.

Page 57: Bioinformatics and Computational Molecular Biology Geoff Barton

Reading material for this lecture:

This lecture itself. pdf’s for “Barton” papers: www.compbio.dundee.ac.uk/ftp/pdf/

Database statistics: http://www.ebi.ac.uk/embl/

Structure of the amino-terminal domain of Cbl complexed to its binding site on ZAP-70 kinase Wuyi Meng, Sansana Sawasdikosol, Steven J. Burakoff, Michael J. EckNature 398, 84 - 90 (04 March 1999)(available on-line at www.nature.com - search for ZAP-70 kinase - republished in December on-line)

Protein recognition: An SH2 domain in disguise John Kuriyan, James E. DarnellNature 398, 22 - 25 (04 March 1999) (news and views article for above paper)

Russell, R. B. and Barton, G. J. (1993), "An SH2-SH3 Domain hybrid", Nature, 364, 765.

Matthews, S., et al. (1994), "The p17 Matrix Protein from HIV-1 is Structurally Similar to Interferon-gamma", Nature, 370, 666-668.

Barton, G. J., Cohen, P. T. C. and Barford, D. (1994),"Conservation Analysis and Structure Prediction of the Protein Serine/Threonine Phosphatases: Sequence Similarity with Diadenosine Tetra-phosphatase fromE. coli Suggests Homology to the Protein Phosphatases", Eur. J. Biochem.,220, 225-237.

Page 58: Bioinformatics and Computational Molecular Biology Geoff Barton

The end of Lecture 1

Lecture 2 will be on sequence comparison methods.