37
BioInformatics (1)

BioInformatics (1). What is Life All About : Self-compiling & self-assembling Complementary surfaces Watson-Crick base pair (Nature April 25, 1953)

Embed Size (px)

Citation preview

Page 1: BioInformatics (1). What is Life All About : Self-compiling & self-assembling Complementary surfaces Watson-Crick base pair (Nature April 25, 1953)

BioInformatics (1)

Page 2: BioInformatics (1). What is Life All About : Self-compiling & self-assembling Complementary surfaces Watson-Crick base pair (Nature April 25, 1953)

What is Life All About :Self-compiling & self-assembling

Complementary surfacesWatson-Crick base pair (Nature April 25, 1953)

Page 3: BioInformatics (1). What is Life All About : Self-compiling & self-assembling Complementary surfaces Watson-Crick base pair (Nature April 25, 1953)

Life Science vs ComputingWhere do parasites come from?

(computer & biological viral codes)

Over $12 billion/year on computer viruses

LoveBugSet dirtemp =3D fso.GetSpecialFolder(2)Set c =3D fso.GetFile(WScript.ScriptFullName)c.Copy(dirsystem&"\MSKernel32.vbs")c.Copy(dirwin&"\Win32DLL.vbs")c.Copy(dirsystem&"\LOVE-LETTER-FOR-YOU.TXT.vbs")regruns()html()spreadtoemail()listadriv()

20 M dead (worse thanblack plague & 1918 Flu)

AIDS - HIV-1 Polymerase drug resistance mutations

M41L, D67N, T69D, L210W, T215Y, H208Y PISPIETVPV KLKPGMDGPK VKQWPLTEEK

IKALIEICAE LEKDGKISKI GPVNPYDTPV

FAIKKKNSDK WRKLVDFREL NKRTQDFCEV

Page 4: BioInformatics (1). What is Life All About : Self-compiling & self-assembling Complementary surfaces Watson-Crick base pair (Nature April 25, 1953)

Concept Computers Organisms

Instructions Program GenomeBits 0,1 a,c,g,tStable memory ROM,Disk,tape DNAActive memory RAM RNAProcessing CPU/Compiler enzyme/RibosomeEditing Editor tRNAEnvironment Sockets,people Water,salts,heatI/O AD/DA proteinsMonomer Minerals NucleotidePolymer chip DNA,RNA,proteinReplication Cut/Paste DNA replicationSensor/In scanner Chem/photo receptor

Exciting Life ??

Page 5: BioInformatics (1). What is Life All About : Self-compiling & self-assembling Complementary surfaces Watson-Crick base pair (Nature April 25, 1953)

of RNA-based life: C,H,N,O,P Useful for many species:Na, K, Fe, Cl, Ca, Mg, Mo, Mn, S, Se, Cu, Ni, Co, Si

Elements

Page 6: BioInformatics (1). What is Life All About : Self-compiling & self-assembling Complementary surfaces Watson-Crick base pair (Nature April 25, 1953)
Page 7: BioInformatics (1). What is Life All About : Self-compiling & self-assembling Complementary surfaces Watson-Crick base pair (Nature April 25, 1953)

The Four Nucleosides of DNA

dA dG dC dT

A nucleoside is a sugar, here deoxyribose, plus a base

dA = deoxyadenosine, etc.

PYRIMIDINESPURINES

Page 8: BioInformatics (1). What is Life All About : Self-compiling & self-assembling Complementary surfaces Watson-Crick base pair (Nature April 25, 1953)

AdenineGuanine

Thymine Cytosine Uracil

BASES

Page 9: BioInformatics (1). What is Life All About : Self-compiling & self-assembling Complementary surfaces Watson-Crick base pair (Nature April 25, 1953)

Base Pairing

Page 10: BioInformatics (1). What is Life All About : Self-compiling & self-assembling Complementary surfaces Watson-Crick base pair (Nature April 25, 1953)

A nucleotide is a phospate, a sugar, and a purine or a pyramidine base.

The monomeric units of nucleic acids are called nucleotides.

Page 11: BioInformatics (1). What is Life All About : Self-compiling & self-assembling Complementary surfaces Watson-Crick base pair (Nature April 25, 1953)
Page 12: BioInformatics (1). What is Life All About : Self-compiling & self-assembling Complementary surfaces Watson-Crick base pair (Nature April 25, 1953)

Chromosomes

Page 13: BioInformatics (1). What is Life All About : Self-compiling & self-assembling Complementary surfaces Watson-Crick base pair (Nature April 25, 1953)

Genome and gene

Entity Definition Molecular Mechanisms Genome Unit of information transmission DNA replication

Gene Unit of information expression (a special sequence of nucleotide bases, whose sequences carry the information required for constructing protein)

Transcription to RNA Translation to protein

Page 14: BioInformatics (1). What is Life All About : Self-compiling & self-assembling Complementary surfaces Watson-Crick base pair (Nature April 25, 1953)

Nucleic acid and proteins

Macromolecule Backbone Repeating unit Length Role

DNA Phosphodiester bonds Deoxyribonucleotides (A, C, G, T)

103-108 Genome Nucleic acid RNA Phosphodiester bonds Ribonucleotides

(A, C, G, U) 103-105 103-104 102-103

Genome Messenger Gene product

Protein ( structure components of cells/tissues/enzymes)

Peptide bonds Amino acids (A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y)

102-103 Gene product

Page 15: BioInformatics (1). What is Life All About : Self-compiling & self-assembling Complementary surfaces Watson-Crick base pair (Nature April 25, 1953)

Nucleotide codes

A Adenine W Weak (A or T)

G Guanine S Strong (G or C)

C Cytosine M Amino (A or C)

T Thymine K Keto (G or T)

U Uracil B Not A (G or C or T)

R Purine (A or G) H Not G (A or C or T)

Y Pyrimidine (C or T) D Not C (A or G or T)

N Any nucleotide V Not T (A or G or C)

Page 16: BioInformatics (1). What is Life All About : Self-compiling & self-assembling Complementary surfaces Watson-Crick base pair (Nature April 25, 1953)

Amino acid codes

AlaArgAsnAspCysGlnGluGlyHisIleLeuLysMetPheProSerThrTrpTyrVa lAsxGlxSecUnk

ARNDCQEGHILKMFPSTWYVBZUX

AlanineArginineAsparagineAspartic acidCysteineGlutamineGlutamic acidGlycineHistidineIsoleucineLeucineLysineMethioninePhenylalanineProlineSerineThreonineTryptophanTyrosineVa lineAsn or AspGln or GluSelenocysteineUnknown

Page 17: BioInformatics (1). What is Life All About : Self-compiling & self-assembling Complementary surfaces Watson-Crick base pair (Nature April 25, 1953)

StandardGenetic

Code

Page 18: BioInformatics (1). What is Life All About : Self-compiling & self-assembling Complementary surfaces Watson-Crick base pair (Nature April 25, 1953)

Schematic illustration of a plant cell(Home for DNA)

Page 19: BioInformatics (1). What is Life All About : Self-compiling & self-assembling Complementary surfaces Watson-Crick base pair (Nature April 25, 1953)
Page 20: BioInformatics (1). What is Life All About : Self-compiling & self-assembling Complementary surfaces Watson-Crick base pair (Nature April 25, 1953)

History of structure determination for nucleic acids and proteinsTechnology development Structure determination

195049 Edman degradation

54 Isomorphous replacement

-helix model

53 DNA double helix modelInsulin primary structure

1960

62 Restriction enzyme

60 Myoglobin tertiary structure

65 tRNAAla primary structure

1970

72 DNA cloning

75 DNA sequencing

73 tRNAPhe tertiary structure

77 X174 complete genome

79 Z-DNA by single crystal differentiation1980

84 Pulse field gel electrophoresis85 Polymerase chain reaction

87 YAC vec tor86 Protein structure by 2D NMR

88 Human Genome Project

1990

93 DNA chip

95 H influenzae complete genome

2000

Page 21: BioInformatics (1). What is Life All About : Self-compiling & self-assembling Complementary surfaces Watson-Crick base pair (Nature April 25, 1953)

Human chromosomes: idiograms

Page 22: BioInformatics (1). What is Life All About : Self-compiling & self-assembling Complementary surfaces Watson-Crick base pair (Nature April 25, 1953)

X-linked recessive disorder. The inheritance pattern is shown for a recessive gene on the chromosome X, designated in bold.

MaleXY

(normal)

FemaleXX

(normal)

Female XX(normal)

Female XX(normal)

Male XY(normal)

Male XY(affected)

Page 23: BioInformatics (1). What is Life All About : Self-compiling & self-assembling Complementary surfaces Watson-Crick base pair (Nature April 25, 1953)

Reductionistic and synthetic approaches in biology

Biological System

(Organism)

Building Blocks

(Genes/Molecules)

Synthetic

Approach

(Bioinformatics)

Reductionistic

Approach

(Experiments)

Page 24: BioInformatics (1). What is Life All About : Self-compiling & self-assembling Complementary surfaces Watson-Crick base pair (Nature April 25, 1953)

Basic principles in physics, chemistry and biology.

Principles Known?

Physics

Matter

Chemistry

Compound

Biology

Organism

ElementaryParticles

Yes

Elements

Yes

Genes

No

Page 25: BioInformatics (1). What is Life All About : Self-compiling & self-assembling Complementary surfaces Watson-Crick base pair (Nature April 25, 1953)
Page 26: BioInformatics (1). What is Life All About : Self-compiling & self-assembling Complementary surfaces Watson-Crick base pair (Nature April 25, 1953)

100 000

10 000

1000

100

101

0.1

0.01

1965 1970 1975 1980 1985 1990 1995

MEDLINE G5 MeSH

2000

Year

Am

ount

(x1

000)

0.001

Transistors / chipDNA sequencesMapped human genes3-D structures

MEDLINE records

Page 27: BioInformatics (1). What is Life All About : Self-compiling & self-assembling Complementary surfaces Watson-Crick base pair (Nature April 25, 1953)
Page 28: BioInformatics (1). What is Life All About : Self-compiling & self-assembling Complementary surfaces Watson-Crick base pair (Nature April 25, 1953)

The addresses for the major databases

Database Organization Address

MEDLINE National Library of Medicine www.nlm.nih.gov

GenBank National Center for Biotechnology Information www.ncbi.nlm.nih.gov

EMBL European Bioinformatics Institute www.ebi.ac.uk

DDBJ National Institute of Genetics, Japan www.ddbj.nig.ac.jp

SWISS-PROT Swiss Institute of Bioinformatics www.expasy.ch

PIR National Biomedical Research Foundation www-nbrf.georgetown.edu

PRF Protein Research Foundation, Japan www.prf.or.jp

PDB Research Collaboratory for Structural Bioinformatics www.rcsb.org

CSD Cambridge Crystallographic Data Centre www.ccdc.cam.ac.uk

Page 29: BioInformatics (1). What is Life All About : Self-compiling & self-assembling Complementary surfaces Watson-Crick base pair (Nature April 25, 1953)

New generation of molecular biology databases

Information Database Address

Compounds and reactions LIGANDAaindex

www.genome.ad.jp/dbget/ligand.htmlwww.genome.ad.jp/dbget/aaindex.html

Protein families andsequence motifs

PROSITEBlocksPRINTSPfamPro Dom

www.expasy.ch/sprot/prosite.htmlwww.blocks.fhcrc.org/www.biochem.ucl.ac.uk.bsm.dbbrowser/PRINTS/www.sanger.ac.uk/Pfam/,pfam.wustl.edu/protein.toulouse.inra.fr/prodom.html

3D fold classifications SCOPCATH

scop.mrc-lmb.cam.ac.uk/scop/www.biochem.ucl.ac.uk/bsm/cath/

Orthologous genes COGKEGG

www.ncbi.nlm.nih.gov/COG/www.genome.ad.jp/kegg/

Biochemical pathways KEGGWITEcoCycUM-BBD

www.genome.ad.jp/kegg/www.mcs.anl.gov/WIT2/ecocyc.PangeaSystems.com/ecocyc/www.labmed.umn.edu/umbbd/

Genome diversity NCBI TaxonomyOMIM

www.ncbi.nlm.nih.gov/Taxonomy/www.ncbi.nlm.nih.gov/Omim/

Page 30: BioInformatics (1). What is Life All About : Self-compiling & self-assembling Complementary surfaces Watson-Crick base pair (Nature April 25, 1953)

Example of sequence database entry for GenbankLOCUS DRODPPC 4001 bp INV 15-MAR-1990DEFINITION D.melanogaster decapentaplegic gene complex (DPP-C), complete cds.ACCESSION M30116KEYWORDS .SOURCE D.melanogaster, cDNA to mRNA.

ORGANISM Drosophila melanogasterEurkaryote; mitochondrial eukaryotes; Metazoa; Arthropoda;Tracheata; Insecta; Pterygota; Diptera; Brachycera; Muscomorpha;Ephydroidea; Drosophilidae; Drosophilia.

REFERENCE 1 (bases 1 to 4001)AUTHORS Padgett, R.W., St Johnston, R.D. and Gelbart, W.M.TITLE A transcript from a Drosophila pattern gene predicts a protein

homologous to the transforming growth factor-beta familyJOURNAL Nature 325, 81-84 (1987)MEDLINE 87090408

COMMENT The initiation codon could be at either 1188-1190 or 1587-1589FEATURES Location/Qualifiers

source 1..4001/organism=“Drosophila melanogaster”/db_xref=“taxon:7227”

mRNA <1..3918/gene=“dpp”/note=“decapentaplegic protein mRNA”/db_xref=“FlyBase:FBgn0000490”

gene 1..4001/note=“decapentaplegic”/gene=“dpp”/allele=“”/db_xref=“FlyBase:FBgn0000490”

CDS 1188..2954/gene=“dpp”/note=“decapentaplegic protein (1188 could be 1587)”/codon_start=1/db_xref=“FlyBase:FBgn0000490”/db_xref=“PID:g157292”/translation=“MRAWLLLLAVLATFQTIVRVASTEDISQRFIAAIAPVAAHIPLASASGSGSGRSGSRSVGASTSTALAKAFNPFSEPASFSDSDKSHRSKTNKKPSKSDANR……………………LGYDAYYCHGKCPFPLADHFNSTNAVVQTLVNNMNPGKVPKACCVPTQLDSVAMLYLNDQSTBVVLKNYQEMTBBGCGCR”

BASE COUNT 1170 a 1078 c 956 g 797 tORIGIN

1 gtcgttcaac agcgctgatc gagtttaaat ctataccgaa atgagcggcg gaaagtgagc 61 cacttggcgt gaacccaaag ctttcgagga aaattctcgg acccccatat acaaatatcg 121 gaaaaagtat cgaacagttt cgcgacgcga agcgttaaga tcgcccaaag atctccgtgc 181 ggaaacaaag aaattgaggc actattaaga gattgttgtt gtgcgcgagt gtgtgtcttc 241 agctgggtgt gtggaatgtc aactgacggg ttgtaaaggg aaaccctgaa atccgaacgg 301 ccagccaaag caaataaagc tgtgaatacg aattaagtac aacaaacagt tactgaaaca

361 gatacagatt cggattcgaa tagagaaaca gatactggag atgcccccag aaacaattca 421 attgcaaata tagtgcgttg cgcgagtgcc agtggaaaaa tatgtggatt acctgcgaac 481 cgtccgccca aggagccgcc gggtgacagg tgtatccccc aggataccaa cccgagccca 541 gaccgagatc cacatccaga tcccgaccgc agggtgccag tgtgtcatgt gccgcggcat 601 accgaccgca gccacatcta ccgaccaggt gcgcctcgaa tgcggcaaca caattttcaa ………………………….3841 aactgtataa acaaaacgta tgccctataa atatatgaat aactatctac atcgttatgc3901 gttctaagct aagctcgaat aaatccgtac acgttaatta atctagaatc gtaagaccta3961 acgcgtaagc tcagcatgtt ggataaatta atagaaacga g

//

Page 31: BioInformatics (1). What is Life All About : Self-compiling & self-assembling Complementary surfaces Watson-Crick base pair (Nature April 25, 1953)

Example of sequence database entry for SWISS-PROT

ID DECA_DROME STANDARD; PRT; 588AA.AC P07713;DT 01-APR-1988 (REL. 07, CREATED)DT 01-APR-1988 (REL. 07, LAST SEQUENCE UPDATE)DT 01-FEB-1995 (REL. 31, LAST ANNOTATION UPDATE)DE DECAPENTAPLEGIC PROTEIN PRECURSOR (DPP-C PROTEIN).GN DPP.OS DROSOPHILA MELANOGASTER (FRUIT FLY).OC EUKARYOTA; METAZOA; ARTHROPODA; INSECTA; DIPTERA.RN [1]RP SEQUENCE FROM N.A.RM 87090408RA PADGETT R.W., ST JOHNSTON R.D., GELBART W.M.;RL NATURE 325:81-84 (1987)RN [2]RP CHARACTERIZATION, AND SEQUENCE OF 457-476.RM 90258853RA PANGANIBAN G.E.F., RASHKA K.E., NEITZEL M.D., HOFFMANN F.M.;RL MOL. CELL. BIOL. 10:2669-2677(1990).CC -!- FUNCTION: DPP IS REQUIRED FOR THE PROPER DEVELOPMENT OF THECC EMBRYONIC DOORSAL HYPODERM, FOR VIABILITY OF LARVAE AND FOR CELLCC VIABILITY OF THE EPITHELIAL CELLS IN THE IMAGINAL DISKS.CC -!- SUBUNIT: HOMODIMER, DISULFIDE-LINKED.CC -!- SIMILARITY: TO OTHER GROWTH FACTORS OF THE TGF-BETA FAMILY.DR EMBL; M30116; DMDPPC.DR PIR; A26158; A26158.DR HSSP; P08112; 1TFG.DR FLYBASE; FBGN0000490; DPP.DR PROSITE; PS00250; TGF_BETA.KW GROWTH FACTOR; DIFFERENTIATION; SIGNAL.FT SIGNAL 1 ? POTENTIAL.FT PROPEP ? 456FT CHAIN 457 588 DECAPENTAPLEGIC PROTEIN.FT DISULFID 487 553 BY SIMILARITY.FT DISULFID 516 585 BY SIMILARITY.FT DISULFID 520 587 BY SIMILARITY.FT DISULFID 552 552 INTERCHAIN (BY SIMILARITY).FT CARBOHYD 120 120 POTENTIAL.FT CARBOHYD 342 342 POTENTIAL.FT CARBOHYD 377 377 POTENTIAL.FT CARBOHYD 529 529 POTENTIAL.SQ SEQUENCE 588 AA; 65850MW; 1768420 CN;

MRAWLLLLAV LATFQTIVRV ASTEDISQRF IAAIAPVAAH IPLASASGSG SGRSGSRSVGASTSTAGAKA FNRFSEPASF SDSDKSHRSK TNKKPSKSDA NRQFNEVHKP RTDQLENSKNKSKQLVNKPN HNKMAVKEQR SHHKKSHHHR SHQPKQASAS TESHQSSSIE SIFVEEPTLVLDREVASINV PANAKAIIAE QGPSTYSKEA LIKDKLKPDP STYLVEIKSL LSLFNMKRPPKIDRSKIIIP EPMKKLYAEI MGHELDSVNI PKPGLLTKSA NTVRSFTHKD SKIDDRFPHHHRFRLHFDVK SIPADEKLKA AELQLTRDAL SQQVVASRSS ANRTRYQBLV YDITRVGVRGQREPSYLLLD TKTBRLNSTD TVSLDVQPAV DRWLASPQRN YGLLVEVRTV RSLKPAPHHHVRLRRSADEA HERWQHKQPL LFTYTDDGRH DARSIRDVSG GEGGGKGGRN KRHARRPTRRKNHDDTCRRH SLYVDFSDVG WDDWIVAPLG YDAYYCHGKC PFPLADHRNS TNHAVVQTLVNNMNPGKBPK ACCBPTQLDS VAMLYLNDQS TVVLKNYQEM TVVGCGCR

Page 32: BioInformatics (1). What is Life All About : Self-compiling & self-assembling Complementary surfaces Watson-Crick base pair (Nature April 25, 1953)

Functional classification of E. coli genes according to Monica RileyI. Intermediary metabolism

A.B.C.D.E.F.

DegradationCentral intermediary metabolismRespiration (aerobic and anaerobic)FermentationATP-proton motive force interconversionsBroad regulatory functions

II. Biosynthesis of small moleculesA.B.C.D.E.F.

Amino acidsNucleotidesSugars and sugar moleculesCofactors, prosthetic groups, electron carriersFatty acids and lipidsPolyamines

III. Macromolecule metabolismA.B.

Synthesis and modificationDegradation of macromolecules

IV. Cell structureA.B.C.D.

Membrane componentsMurein sacculusSurface polysaccharides and antigensSurface structures

V. Cellular processesA.B.C.D.E.

Transport/binding proteinsCell divisionChemotaxis and mobilityProtein secretionOsmotic adaptions

VI. Other functionsA.B.C.D.E.F.G.H.

Cryptic genesPhage-related functions and prophagesColicin-related functionsPlasmid-related functionsDrug/analog sensitivityRadation sensitivityDNA sitesAdaptations to atypical conditions

Page 33: BioInformatics (1). What is Life All About : Self-compiling & self-assembling Complementary surfaces Watson-Crick base pair (Nature April 25, 1953)

The Protein Folding Problem

Page 34: BioInformatics (1). What is Life All About : Self-compiling & self-assembling Complementary surfaces Watson-Crick base pair (Nature April 25, 1953)

Protein Folding Problem(Sequence 3D Structure)

1 Protein folding is thermodynamically determined (Anfinsen’s thermodynamic principle)

Protein + Environment

2. Protein folding is a reaction imvolving other interacting molecules (Principle of molecular interactions)

Protein + Chaperonins +….

Page 35: BioInformatics (1). What is Life All About : Self-compiling & self-assembling Complementary surfaces Watson-Crick base pair (Nature April 25, 1953)

Central Paradigm

Page 36: BioInformatics (1). What is Life All About : Self-compiling & self-assembling Complementary surfaces Watson-Crick base pair (Nature April 25, 1953)

Bioinformatics : A Long Journey(How far are we away from knowing the God ??)

Sequence to exon 80% [Laub 98]Exons to gene (without cDNA or homolog) ~30% [Laub 98]Gene to regulation ~10% [Hughes 00]Regulated gene to protein sequence 98% [Gesteland ]Sequence to secondary-structure (,,c) 77% [CASP]Secondary-structure to 3D structure 25% [CASP] 3D structure to ligand specificity ~10% [Johnson 99]

Expected accuracy overall ~ = 0.8*.3*.1*.98*.77*.25*.1 = .0005 ?

Page 37: BioInformatics (1). What is Life All About : Self-compiling & self-assembling Complementary surfaces Watson-Crick base pair (Nature April 25, 1953)

Our Focus in Bioinformatics PerturbationEnvironmentMedicationGenetic Engineering

Dynamic ResponseGene ExpressionProtein Expression

BioChip

DataBaseGenotype/Phenotype

SymbolicAlgorithms/Computing

Analysis

BiologyMolecular BiologyBio ChemistryGenetics

Virtual Cell

Genome Sequencing