24
Computational Structural Biology in Post-genomic Era Ivet Bahar Ivet Bahar Department of Computational Biology Department of Computational Biology and Department of Molecular Genetics & Biochemistry and Department of Molecular Genetics & Biochemistry School of Medicine, University of Pittsburgh School of Medicine, University of Pittsburgh

Computational Structural BiologyComputational Structural Biology in Post-genomic Era Ivet Bahar Department of Computational Biology and Department of Molecular Genetics & BiochemistryC

  • Upload
    others

  • View
    4

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Computational Structural BiologyComputational Structural Biology in Post-genomic Era Ivet Bahar Department of Computational Biology and Department of Molecular Genetics & BiochemistryC

Computational Structural Biologyin

Post-genomic Era

Ivet BaharIvet Bahar

Department of Computational BiologyDepartment of Computational Biologyand Department of Molecular Genetics & Biochemistry and Department of Molecular Genetics & Biochemistry

School of Medicine, University of PittsburghSchool of Medicine, University of Pittsburgh

Page 2: Computational Structural BiologyComputational Structural Biology in Post-genomic Era Ivet Bahar Department of Computational Biology and Department of Molecular Genetics & BiochemistryC

C. Branden and J. Tooze. Introduction to Protein Structure. 2nd edition, Garland Publishing Inc., New York, 1999.

C. L. Brooks, III, M. Karplus, and B.M. Pettitt. A Theoretical Perspective of Dynamics, Structure, and Thermodynamics. Wiley Interscience, New York, 1988.

C.R. Cantor and P.R. Schimmel. Biophysical Chemistry. Vol.1,2,3. W.H. Freeman and Company, San Fransisco, 1980. T.E. Creighton, Editor. Protein Folding. W.H. Freeman & Company, New York, 1992.

A. Fersht. Structure and Mechanism in Protein Science: A Guide to Enzyme Catalysis and Protein Folding.W. H. Freeman and Company, New York, 1999.

L.M. Gierasch and J. King, Editors. Protein Folding, Deciphering the Second Half of the Genetic Code. AAAS, Washington D.C., 1990.

A.Y. Grosberg and A.R. Khokhlov. Giant Molecules. Here, There, and Everywhere... Academic Press, San Diego, California, 1997.

A. R. Leach. Molecular Modelling. Principles and Applications. Addison Wesley Longman, Essex, England, 1996.

J.A. McCammon and S.C. Harvey. Dynamics of Proteins and Nucleic Acids. Cambridge University Press, Cambridge, 1987.

G.E. Schulz and R.H. Schirmer. Principles of Protein Structure. Springer Advanced Texts in Chemistry, Springer-Verlag, New York, 1990.

L. Stryer. Biochemistry. W.H. Freeman, New York, latest edition.

References

Page 3: Computational Structural BiologyComputational Structural Biology in Post-genomic Era Ivet Bahar Department of Computational Biology and Department of Molecular Genetics & BiochemistryC

2001 2001 –– Draft version of human genome publishedDraft version of human genome published

• 1990 - Human Genome Project (HGP) launched• 1993 – 1st five-year plan published• 1995 - first bacterial genome published (haemophilus influenza)• 1996 – yeast genome sequenced• 1997 – E coli genome sequenced• 1998 – C elegans genome• 1998 – 2nd five-year plan for HGP• 2000 – Fruit fly genome• 2002 – rice genome – 1st draft• 2002 –mouse genome – 1st draft• 2003 – HGP completed

I. Recent progresses

Page 4: Computational Structural BiologyComputational Structural Biology in Post-genomic Era Ivet Bahar Department of Computational Biology and Department of Molecular Genetics & BiochemistryC

GENOME SEQUENCING PROJECTSE. coli

Genome: 4.6 million nucleotides 4289 proteins

HumanGenome: 3 billion nucleotides

~30-40,000 proteins

Page 5: Computational Structural BiologyComputational Structural Biology in Post-genomic Era Ivet Bahar Department of Computational Biology and Department of Molecular Genetics & BiochemistryC

Image: Digital Vision, PhotoDisc, Matt Ray/EHP c

The genomes of many species have been sequenced to date...

... but limited information is conveyed from sequence about how genomes and proteomes give rise to biological function.

Page 6: Computational Structural BiologyComputational Structural Biology in Post-genomic Era Ivet Bahar Department of Computational Biology and Department of Molecular Genetics & BiochemistryC

Exponential growth inExponential growth in

Sequential, structural , genetic and biomedical dataComputational technology

Rost, B. (1998). Marrying structure and genomics. Structure 6, 259-263

Presently: 35,579 structures in the PDB

Berman et al: The Protein Data Bank. NAR 28, 235-242 (2000).

Structural genomicsFunctional genomicsProteomics

As of April 2004, there are over 38,989,342,565 bases.in Entrez nucleotide DB

Page 7: Computational Structural BiologyComputational Structural Biology in Post-genomic Era Ivet Bahar Department of Computational Biology and Department of Molecular Genetics & BiochemistryC

The race to computerize biologyDec 12th 2002 From The Economist print edition

“In life-sciences establishments around the world, the laboratory rat is giving way to the computer mouse—as computing joins forces with biology to create a bioinformatics market that is expected to be worth nearly $40 billion within three years”

“Wet lab processes are giving way to digital research done in silico”

Biotech and pharmaceutical industry became one of the biggest consumers of computing power,

supercomputing powers of petaflops (~ 1012

floating-point operations per sec)

Storage capacity of terabytes (~ 109 of bytes)

““A big risk of computer modeling and other A big risk of computer modeling and other tools is to rely too much on them.tools is to rely too much on them.””

Promising Future for Computational BiologyPromising Future for Computational Biology

Page 8: Computational Structural BiologyComputational Structural Biology in Post-genomic Era Ivet Bahar Department of Computational Biology and Department of Molecular Genetics & BiochemistryC

Bioinformatics Moves to Center Stage in the Genetic Revolution

“By embarking on Bioinformatics and Computational Biology initiatives, the NIH Roadmapis paving a future “information superhighway” dedicated to advancing medical research”http://nihroadmap.nih.gov/bioinformatics/index.asp

Data/information Knowledge?

Computational Biology

A multidisciplinary fieldA multidisciplinary field encompassingencompassing

molecularmolecular--toto--cellular cellular modelingmodeling of structure and of structure and functionfunction

physically inspired physically inspired simulationsimulation and visualization of and visualization of complex processes at multiple scalescomplex processes at multiple scales

elucidation of the mechanism of operation of elucidation of the mechanism of operation of biological biological systemssystems (networks of interactions)(networks of interactions)

IInn a special collection of articles a special collection of articles published beginning 6 February 2004, published beginning 6 February 2004, ScienceScience Magazine and its online Magazine and its online companion sites team up to explore the companion sites team up to explore the interface between mathematics and interface between mathematics and biology biology

• Biological Pathways, and Networks• Molecular Libraries and Imaging• Structural Biology• Bioinformatics and Computational Biology• Nanomedicine

Page 9: Computational Structural BiologyComputational Structural Biology in Post-genomic Era Ivet Bahar Department of Computational Biology and Department of Molecular Genetics & BiochemistryC

Structure & energeticsSpace & time dependence3D-models & simulationsPrinciples of statistical

mechanics, thermodynamics,physics, chemistry

Bottlenecks: size of systems, spatial aspects,simulation time

Database analysisPattern recognition1-d arrays Homology modelingStatistical approachesComputationally driven

Bottlenecks: size of databases,noise of data,normalization issues

Computational Biology Computational Biology BioinformaticsBioinformatics

Page 10: Computational Structural BiologyComputational Structural Biology in Post-genomic Era Ivet Bahar Department of Computational Biology and Department of Molecular Genetics & BiochemistryC

Computational Computational StructuralStructural BiologyBiologyComputational Computational GenomicsGenomicsSystemsSystems // Mathematical BiologyMathematical BiologyComputational Computational NeurobiologyNeurobiologyBioimageBioimage InformaticsInformatics

Five areas of specializationFive areas of specialization

Page 11: Computational Structural BiologyComputational Structural Biology in Post-genomic Era Ivet Bahar Department of Computational Biology and Department of Molecular Genetics & BiochemistryC

Proteomics – Examining all proteins encoded by a given genome

No ‘new’ folds deposited in the PDB in 2004-2006

A more integrated view of

• structural dynamics• protein-protein interactionsor• systems level assessment of pathways, networks, and their dynamics

is now possible.

Page 12: Computational Structural BiologyComputational Structural Biology in Post-genomic Era Ivet Bahar Department of Computational Biology and Department of Molecular Genetics & BiochemistryC

from sequence,

We need to understand the We need to understand the physicalphysical principles principles that underlie the passagethat underlie the passage

to structure...

to dynamics...

Page 13: Computational Structural BiologyComputational Structural Biology in Post-genomic Era Ivet Bahar Department of Computational Biology and Department of Molecular Genetics & BiochemistryC

523 THR

521 THR

522 ALA

524 GLU

527 PTR

from interacting atoms...

to interacting molecules...

Page 14: Computational Structural BiologyComputational Structural Biology in Post-genomic Era Ivet Bahar Department of Computational Biology and Department of Molecular Genetics & BiochemistryC

To interaction networks at the cellular scale...

CelllularCelllular networks are usually described by networks are usually described by simple masssimple mass--action kineticsaction kinetics

Page 15: Computational Structural BiologyComputational Structural Biology in Post-genomic Era Ivet Bahar Department of Computational Biology and Department of Molecular Genetics & BiochemistryC
Page 16: Computational Structural BiologyComputational Structural Biology in Post-genomic Era Ivet Bahar Department of Computational Biology and Department of Molecular Genetics & BiochemistryC

LifeLife’’s complexity pyramids complexity pyramid

Oltvai & Barabasi, Science 298, 763-764, 2002.

Page 17: Computational Structural BiologyComputational Structural Biology in Post-genomic Era Ivet Bahar Department of Computational Biology and Department of Molecular Genetics & BiochemistryC

Interaction networks Interaction networks –– at all scalesat all scales

In a special collection of articles published beginning 6 February2004, Science Magazine and its online companion sites team up to explore the interface between mathematics and biology

Page 18: Computational Structural BiologyComputational Structural Biology in Post-genomic Era Ivet Bahar Department of Computational Biology and Department of Molecular Genetics & BiochemistryC

ProteomicsProteomics

Examination of all proteins encoded by a given genomeExamination of all proteins encoded by a given genome

Figure: courtesy of Mark Gerstein 2003, Yale U

Page 19: Computational Structural BiologyComputational Structural Biology in Post-genomic Era Ivet Bahar Department of Computational Biology and Department of Molecular Genetics & BiochemistryC

Sequence --------------> Structure

Function

‘Protein folding problem’

Bioinformatics. Sequence alignments

Fundamental paradigm: Sequence encodes structure; structure encodes function

Mod

elling

and s

imula

tions

Page 20: Computational Structural BiologyComputational Structural Biology in Post-genomic Era Ivet Bahar Department of Computational Biology and Department of Molecular Genetics & BiochemistryC

Stuctures suggest mechanisms of function

A. Comparison of static structures available in the PDB for the same protein in different form has been widely used as an indirect method of inferring dynamics.

B. NMR structures provide information on fluctuation dynamics

Bahar et al. J. Mol. Biol. 285, 1023, 1999.

Page 21: Computational Structural BiologyComputational Structural Biology in Post-genomic Era Ivet Bahar Department of Computational Biology and Department of Molecular Genetics & BiochemistryC

Pennisi, E. (1998) Science 279, 978; Hubbard et al. (1999) Nucleic Acids Res 254.

Classification of structural data (SCOP)Classification of structural data (SCOP)

Page 22: Computational Structural BiologyComputational Structural Biology in Post-genomic Era Ivet Bahar Department of Computational Biology and Department of Molecular Genetics & BiochemistryC

Primary SequenceMNGTEGPNFY VPFSNKTGVV RSPFEAPQYY LAEPWQFSML AAYMFLLIML GFPINFLTLY VTVQHKKLRT PLNYILLNLA VADLFMVFGG FTTTLYTSLH GYFVFGPTGC NLEGFFATLG GEIALWSLVV LAIERYVVVC KPMSNFRFGE NHAIMGVAFT WVMALACAAP PLVGWSRYIP EGMQCSCGID YYTPHEETNN ESFVIYMFVV HFIIPLIVIF FCYGQLVFTV KEAAAQQQES ATTQKAEKEV TRMVIIMVIA FLICWLPYAG VAFYIFTHQG SDFGPIFMTI PAFFAKTSAV YNPVIYIMMN KQFRNCMVTT LCCGKNPLGD DEASTTVSKT ETSQVAPA

PROTEINSSequence Structure

3D Structure

Folding

Page 23: Computational Structural BiologyComputational Structural Biology in Post-genomic Era Ivet Bahar Department of Computational Biology and Department of Molecular Genetics & BiochemistryC

FACTS:

Each sequence folds into a unique structure – native structureProteins are functional only in their native stateSequence structure mapping is not yet understoodFolding is reversible – unfolding and re-folding is possible

Page 24: Computational Structural BiologyComputational Structural Biology in Post-genomic Era Ivet Bahar Department of Computational Biology and Department of Molecular Genetics & BiochemistryC

6/6/20066/6/2006 2424

Protein folding problem:Protein folding problem:““Predicting 3Predicting 3--dimensional structure from sequencedimensional structure from sequence””

A unique folded structure (native conformation, native fold) is A unique folded structure (native conformation, native fold) is assumed by a given sequence, although infinitely many assumed by a given sequence, although infinitely many conformations can be accessed. conformations can be accessed. Which? (Protein folding problem)Which? (Protein folding problem)How, why? (Folding kinetics) How, why? (Folding kinetics)

Basic postulate: Thermodynamic equilibrium Global energy minimum