61
www.bioalgorithms.info An Introduction to Bioinformatics Algorithms Molecular Biology Primer Molecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen, Mike Daly, Hoa Dinh, Erinn Hama, Robert Hinman, Julio Ng, Michael S dd H T J W Ch F Y Sneddon, Hoa Troung, Jerry Wang, Che Fung Yung

MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

  • Upload
    vutu

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Page 1: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

www.bioalgorithms.infoAn Introduction to Bioinformatics Algorithms

Molecular Biology PrimerMolecular Biology Primer

Angela Brooks, Raymond Brown, Calvin Chen, Mike Daly, Hoa Dinh, Erinn Hama, Robert Hinman, Julio Ng, Michael S dd H T J W Ch F YSneddon, Hoa Troung, Jerry Wang, Che Fung Yung

Page 2: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Section1: What is Life made of?Section1: What is Life made of?

Page 3: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Outline For Section 1:Outline For Section 1:

• All living things are made of Cells • Prokaryote, Eukaryotey y

• Cell Signaling• What is Inside the cell: From DNA to RNA to • What is Inside the cell: From DNA, to RNA, to

Proteins

Page 4: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

CellsCells

f• Fundamental working units of every living system. • Every organism is composed of one of two

radically different types of cells: radically different types of cells: prokaryotic cells or eukaryotic cells.eukaryotic cells.

• Prokaryotes and Eukaryotes are descended from the same primitive cell. • All extant prokaryotic and eukaryotic cells are the result of a total of 3.5

billion years of evolution.

Page 5: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Life begins with CellLife begins with Cell

• A cell is a smallest structural unit of an organism that is capable of independent functioning

• All cells have some common features

Page 6: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

2 types of cells: Prokaryotes2 types of cells: Prokaryotes v.s.Eukaryotes

Page 7: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Prokaryotes and EukaryotesProkaryotes and Eukaryotes, continuedProkaryotes Eukaryotes

Si l ll Si l lti llSingle cell Single or multi cell

No nucleus NucleusNo nucleus Nucleus

No organelles Organelles

One piece of circular DNA Chromosomes

No mRNA post transcriptional modification

Exons/Introns splicing

Page 8: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Prokaryotes v.s. Eukaryotesy yStructural differences

Prokaryotes Eubacterial (blue green algae)

Eukaryotes plants, animals, Protista, and fungi( g g )

and archaebacteria only one type of membrane--

plasma membrane forms

plants, animals, Protista, and fungi

complex systems of internal plasma membrane forms the boundary of the cell proper

The smallest cells known are b t i

membranes forms organelle and compartments

The volume of the cell is several bacteria Ecoli cell 3x106 protein molecules

The volume of the cell is several hundred times larger Hela cell

5 109 t i l l 1000-2000 polypeptide species. 5x109 protein molecules 5000-10,000 polypeptide species

Page 9: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Example of cell signalingExample of cell signaling

Page 10: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Overview of organizations of lifeOverview of organizations of life• Nucleus = library• Chromosomes = bookshelves• Genes = books

Al t ll i i t i th• Almost every cell in an organism contains the same libraries and the same sets of books.

• Books represent all the information (DNA) that every cell in the body needs so it can y ygrow and carry out its vaious functions.

Page 11: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Some TerminologySome Terminology

• Genome: an organism’s genetic material

• Gene: a discrete units of hereditary information located on the chromosomes and consisting of DNA.chromosomes and consisting of DNA.

• Genotype: The genetic makeup of an organism

• Phenotype: the physical expressed traits of an organism

• Nucleic acid: Biological molecules(RNA and DNA) that allow organisms to reproduce;

Page 12: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

More TerminologyMore Terminology

• The genome is an organism’s complete set of DNA.• a bacteria contains about 600,000 DNA base pairs• human and mouse genomes have some 3 billion.

• human genome has 24 distinct chromosomes• human genome has 24 distinct chromosomes.• Each chromosome contains many genes.

• Gene• basic physical and functional units of heredity. • specific sequences of DNA bases that encode

instructions on how to make proteinsinstructions on how to make proteins.• Proteins

• Make up the cellular structurep• large, complex molecules made up of smaller subunits

called amino acids.

Page 13: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

All Life depends on 3 critical moleculesAll Life depends on 3 critical molecules

DNA• DNAs• Hold information on how cell worksRNA• RNAs• Act to transfer short pieces of information to different parts

of cellof cell• Provide templates to synthesize into protein

• Proteins• Proteins• Form enzymes that send signals to other cells and regulate

gene activityg y• Form body’s major components (e.g. hair, skin, etc.)

Page 14: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

DNA: The Code of LifeDNA: The Code of Life

• The structure and the four genomic letters code for all living organismsorganisms

• Adenine, Guanine, Thymine, and Cytosine which pair A-T and C-G on complimentary strands.

Page 15: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

DNA RNA and the Flow ofDNA, RNA, and the Flow of Information

Replication

TranslationTranscription TranslationTranscription

Page 16: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Overview of DNA to RNA to ProteinOverview of DNA to RNA to Protein

• A gene is expressed in two steps1) Transcription: RNA synthesis2) Translation: Protein synthesis2) Translation: Protein synthesis

Page 17: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Cell Information: Instruction book ofCell Information: Instruction book of Life

DNA RNA d• DNA, RNA, and Proteins are examples of strings written inof strings written in either the four-letter nucleotide of DNA and RNA (A C G T/U)

• or the twenty-letter i id f t iamino acid of proteins.

Each amino acid is coded by 3 nucleotidescoded by 3 nucleotides called codon. (Leu, Arg, Met, etc.)

Page 18: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Genetic Information: Chromosomes

• (1) Double helix DNA strand. • (2) Chromatin strand (DNA with histones)

(3) C d d h ti d i i t h ith t • (3) Condensed chromatin during interphase with centromere. • (4) Condensed chromatin during prophase • (5) Chromosome during metaphase( ) g p

Page 19: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Genes Make ProteinsGenes Make Proteins

i (f ll l l & lif• genome-> genes ->protein(forms cellular structural & life functional)->pathways & physiology

Page 20: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Proteins: Workhorses of the CellProteins: Workhorses of the Cell• 20 different amino acids

diff t h i l ti th t i h i t f ld • different chemical properties cause the protein chains to fold up into specific three-dimensional structures that define their particular functions in the cell.

• Proteins do all essential work for the cell• Proteins do all essential work for the cell• build cellular structures• digest nutrients • execute metabolic functions• Mediate information flow within a cell and among cellular

communities. • Proteins work together with other proteins or nucleic acids as

"molecular machines" • structures that fit together and function in highly structures that fit together and function in highly

specific, lock-and-key ways.

Page 21: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Transcriptional RegulationTranscriptional RegulationSWI/SNF

SWI5

RNA Pol IITATA BPGENERAL TFs

Lodish et al Molecular Biology of the Cell (5th ed ) W H Freeman & Co 2003Lodish et al. Molecular Biology of the Cell (5 ed.). W.H. Freeman & Co., 2003.

Page 22: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

The Histone CodeThe Histone Code • State of histone tails govern TF access to DNA

• State is governed by amino acid sequence and modification (acetylation, phosphorylation, methylation)

Lodish et al Molecular Biology of the Cell (5th ed ) W H Freeman & Co 2003Lodish et al. Molecular Biology of the Cell (5 ed.). W.H. Freeman & Co., 2003.

Page 23: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

C t l D f Bi lCentral Dogma of Biologyf fThe information for making proteins is stored in DNA. There is

a process (transcription and translation) by which DNA is converted to protein. By understanding this process and how itconverted to protein. By understanding this process and how it is regulated we can make predictions and models of cells.

Assembly

Protein

Sequence analysis

Gene Finding

Protein Sequence Analysis

g

Page 24: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

RNARNA• RNA is similar to DNA chemically. It is usually only

i l t d T(h i ) i l d b U( il)a single strand. T(hyamine) is replaced by U(racil)• Some forms of RNA can form secondary structures

b “ i i ” ith it lf Thi h h itby “pairing up” with itself. This can have change its properties

dramatically.DNA and RNADNA and RNAcan pair with each other.

http://www.cgl.ucsf.edu/home/glasfeld/tutorial/trna/trna.giftRNA linear and 3D view:

Page 25: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

RNA continuedRNA, continued• Several types exist, classified by function• mRNA – this is what is usually being referred y g

to when a Bioinformatician says “RNA”. This is used to carry a gene’s message out of theis used to carry a gene s message out of the nucleus.tRNA transfers genetic information from• tRNA – transfers genetic information from mRNA to an amino acid sequence

• rRNA – ribosomal RNA. Part of the ribosome which is involved in translation.

Page 26: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Terminology for TranscriptionTerminology for Transcription

h RNA (h l RNA) E k i RNA i• hnRNA (heterogeneous nuclear RNA): Eukaryotic mRNA primary transcipts whose introns have not yet been excised (pre-mRNA).

• Phosphodiester Bond: Esterification linkage between a phosphate group and two alcohol groups.

• Promoter: A special sequence of nucleotides indicating the starting point for RNA synthesis.p y

• RNA (ribonucleotide): Nucleotides A,U,G, and C with ribose• RNA Polymerase II: Multisubunit enzyme that catalyzes the

synthesis of an RNA molecule on a DNA template from nucleosidesynthesis of an RNA molecule on a DNA template from nucleoside triphosphate precursors.

• Terminator: Signal in DNA that halts transcription.

Page 27: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

TranscriptionTranscriptionTh f ki• The process of making RNA from DNA

• Catalyzed by• Catalyzed by “transcriptase” enzyme

• Needs a promoter• Needs a promoter region to begin transcription.p

• ~50 base pairs/second in bacteria, but multiple

i itranscriptions can occur simultaneously

http://ghs.gresham.k12.or.us/science/ps/sci/ibbio/chem/nucleic/chpt15/transcription.gif

Page 28: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

DNA RNA: TranscriptionDNA RNA: TranscriptionDNA t t ib d b• DNA gets transcribed by a protein known as RNA-polymerasepolymerase

• This process builds a chain of bases that will become mRNAbases that will become mRNA

• RNA and DNA are similar, except that RNA is single p gstranded and thus less stable than DNA

Al i RNA th b il (U) i• Also, in RNA, the base uracil (U) is used instead of thymine (T), the DNA counterpart

Page 29: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Definition of a GeneDefinition of a Gene

• Regulatory regions: up to 50 kb upstream of +1 site

• Exons: protein coding and untranslated regions (UTR)1 to 178 exons per gene (mean 8.8)8 b t 17 kb ( 145 b )8 bp to 17 kb per exon (mean 145 bp)

• Introns: splice acceptor and donor sites, junk DNAaverage 1 kb – 50 kb per intron

• Gene size: Largest – 2.4 Mb (Dystrophin). Mean – 27 kb.

Page 30: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Central Dogma RevisitedCentral Dogma RevisitedSplicingTranscription

DNA hnRNA mRNAp g

Spliceosome

Transcription

Nucleus Spliceosome

Translation

Nucleus

proteinTranslation

Ribosome in Cytoplasm• Base Pairing Rule: A and T or U is held together by

2 hydrogen bonds and G and C is held together by 3 y g g yhydrogen bonds.

• Note: Some mRNA stays as RNA (ie tRNA,rRNA).y ( , )

Page 31: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Terminology for SplicingTerminology for Splicing• Exon: A portion of the gene that appears in p g pp

both the primary and the mature mRNA transcripts.

• Intron: A portion of the gene that is transcribed but excised prior to translation. p

• Lariat structure: The structure that an intron in mRNA takes during excision/splicing.in mRNA takes during excision/splicing.

• Spliceosome: A organelle that carries out the splicing reactions whereby the pre-mRNA issplicing reactions whereby the pre mRNA is converted to a mature mRNA.

Page 32: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

SplicingSplicing

Page 33: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Splicing: hnRNA mRNASplicing: hnRNA mRNA Takes place on spliceosome

that brings together a hnRNA, g gsnRNPs, and a variety of pre-mRNA binding proteins.

• 2 transesterification reactions:1. 2’,5’ phosphodiester bond forms

between an intron adenosine residue and the intron’s 5’-terminal phosphate group and a lariat structure is formed.

2. The free 3’-OH group of the 5’ g pexon displaces the 3’ end of the intron, forming a phosphodiester bond with the 5’ terminal phosphate of the 3’ exon to yield the spliced product. The lariat formed i t i th d d dintron is the degraded.

Page 34: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Splicing and other RNA processingSplicing and other RNA processing

• In Eukaryotic cells, RNA is processed between transcription and translation.

• This complicates the relationship between a DNA gene and the protein it codes forDNA gene and the protein it codes for.

• Sometimes alternate RNA processing can lead to an alternate protein as a res lt Thislead to an alternate protein as a result. This is true in the immune system.

Page 35: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Splicing (Eukaryotes)Splicing (Eukaryotes)• Unprocessed RNA is

composed of Introns and Extrons. Introns are removed before the rest isremoved before the rest is expressed and converted to protein.p

• Sometimes alternate splicings can create different valid proteins.

• A typical Eukaryotic gene h 4 20 i t L tihas 4-20 introns. Locating them by analytical means is not easyis not easy.

Page 36: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Posttranscriptional Processing: Capping Posttranscriptional Processing: Capping and Poly(A) Tail Poly(A) TailCapping• Prevents 5’ exonucleolytic

degradation

Poly(A) Tail• Due to transcription termination

process being imprecise.• 2 reactions to append:degradation.

• 3 reactions to cap:1. Phosphatase removes 1

pp1. Transcript cleaved 15-25 past

highly conserved AAUAAA sequence and less than 50 nucleotides before less1. Phosphatase removes 1

phosphate from 5’ end of hnRNA

2 G f

nucleotides before less conserved U rich or GU rich sequences.

2. Poly(A) tail generated from ATP2. Guanyl transferase adds a

GMP in reverse linkage 5’ to 5’.

2. Poly(A) tail generated from ATP by poly(A) polymerase which is activated by cleavage and polyadenylation specificity factor (CPSF) when CPSF recognizes

3. Methyl transferase adds methyl group to guanosine.

(CPSF) when CPSF recognizes AAUAAA. Once poly(A) tail has grown approximately 10 residues, CPSF disengages f th iti itfrom the recognition site.

Page 37: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Terminology for Protein FoldingTerminology for Protein Folding• Endoplasmic Reticulum: MembraneousEndoplasmic Reticulum: Membraneous

organelle in eukaryotic cells where lipid synthesis and some posttranslational y pmodification occurs.

• Mitochondria: Eukaryotic organelle whereMitochondria: Eukaryotic organelle where citric acid cycle, fatty acid oxidation, and oxidative phosphorylation occur.oxidative phosphorylation occur.

• Molecular chaperone: Protein that binds to unfolded or misfolded proteins to refold theunfolded or misfolded proteins to refold the proteins in the quaternary structure.

Page 38: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Uncovering the codeUncovering the code• Scientists conjectured that proteins came from DNA;

but how did DNA code for proteins?• If one nucleotide codes for one amino acid, then

there’d be 41 amino acids• However, there are 20 amino acids, so at least 3

bases codes for one amino acid, since 42 = 16 and 43 6443 = 64• This triplet of bases is called a “codon”

64 diff t d d l 20 i id th t• 64 different codons and only 20 amino acids means that the coding is degenerate: more than one codon sequence code for the same amino acid

Page 39: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Protein FoldingProtein Folding• Proteins tend to fold into the lowest

free energy conformationfree energy conformation.• Proteins begin to fold while the

peptide is still being translated.Proteins bury most of its hydrophobic• Proteins bury most of its hydrophobic residues in an interior core to form an α helix.M t t i t k th f f• Most proteins take the form of secondary structures α helices and βsheets.

60• Molecular chaperones, hsp60 and hsp 70, work with other proteins to help fold newly synthesized proteins.

• Much of the protein modifications and folding occurs in the endoplasmic reticulum and mitochondria.

Page 40: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Protein FoldingProtein Folding• Proteins are not linear structures, though they are

built that way• The amino acids have very different chemical

properties; they interact with each other after the t i i b iltprotein is built

• This causes the protein to start fold and adopting it’s functional structurefunctional structure

• Proteins may fold in reaction to some ions, and several separate chains of peptides may join together through their p p p y j g ghydrophobic and hydrophilic amino acids to form a polymer

Page 41: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Protein Folding ( ’d)Protein Folding (cont’d)

Th t t th t• The structure that a protein adopts is vital to it’s chemistryit s chemistry

• Its structure determines which of its amino acidswhich of its amino acids are exposed carry out the protein’s function

• Its structure also determines what substrates it can reactsubstrates it can react with

Page 42: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

BioinformaticsBioinformaticsSequence Driven Problemsq• Proteomics

• Identification of functional domains in protein’s sequence

• Determining functional pieces in proteins.• Protein Folding

• 1D Sequence → 3D Structure• What drives this process?

Page 43: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

ProteinsProteins• Carry out the cell's chemistry

• 20 amino acids• 20 amino acids• A more complex polymer than DNA

• Sequence of 100 has 20100 combinations• Sequence analysis is difficult because of complexity issue• Only a small number of the possible sequences are actually used in

life. (Strong argument for Evolution)life. (Strong argument for Evolution)• RNA Translated to Protein, then Folded

• Sequence to 3D structure (Protein Folding Problem)• Translation occurs on Ribosomes• 3 letters of DNA → 1 amino acid

• 64 possible combinations map to 20 amino acids64 possible combinations map to 20 amino acids • Degeneracy of the genetic code

• Several codons to same protein

Page 44: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Structure to FunctionStructure to Function• Organic chemistry shows us that the

structure of the molecules determines their possible reactions.

• One approach to study proteins is to inferOne approach to study proteins is to infer their function based on their structure, especially for active sitesespecially for active sites.

Page 45: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Two Quick BioinformaticsTwo Quick Bioinformatics Applicationspp• BLAST (Basic Local Alignment Search Tool)• PROSITE (Protein Sites and Patterns (

Database)

Page 46: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

BLASTBLAST• A computational tool that allows us to

compare query sequences with entries in current biological databases.

• A great tool for predicting functions of aA great tool for predicting functions of a unknown sequence based on alignment similarities to known genessimilarities to known genes.

Page 47: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

BLASTBLAST

Page 48: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Some Early Roles of BioinformaticsSome Early Roles of Bioinformatics

• Sequence comparison• Searches in sequence databasesq

Page 49: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Biological Sequence ComparisonBiological Sequence Comparison

• Needleman- Wunsch, 1970• Dynamic programming

algorithm to align sequencessequences

Page 50: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Early Sequence MatchingEarly Sequence Matching

• Finding locations of restriction sites of known restriction enzymes within a DNA sequence (very trivial application)

• Alignment of protein sequence with scoring motif• Generating contiguous sequences from short DNA

fragments.• This technique was used together with PCR and automated

HT i t t th t fHT sequencing to create the enormous amount of sequence data we have today

Page 51: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Biological DatabasesBiological Databases

• Vast biological and sequence data is freely available through online databasesU t ti l l ith t ffi i tl t l t• Use computational algorithms to efficiently store large amounts of biological data

Examples

• NCBI GeneBank http://ncbi.nih.govHuge collection of databases, the most prominent being the nucleotide sequence database

• Protein Data Bank http://• Protein Data Bank http://www.pdb.org

Database of protein tertiary structures

• SWISSPROT http://www.expasy.org/sprot/ • Database of annotated protein sequencesDatabase of annotated protein sequences

• PROSITE http://kr.expasy.org/prositeDatabase of protein active site motifs

Page 52: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

PROSITE DatabasePROSITE Database• Database of protein active sites.• A great tool for predicting the existence of g p g

active sites in an unknown protein based on primary sequenceprimary sequence.

Page 53: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

PROSITEPROSITE

Page 54: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Sequence AnalysisSequence AnalysisS l ith l bi l i l• Some algorithms analyze biological sequences for patterns• RNA splice sites• ORFs• Amino acid propensities in a protein• Conserved regions in

• AA sequences [possible active site]• DNA/RNA [possible protein binding site]

Oth k di ti b d• Others make predictions based on sequence• Protein/RNA secondary structure folding

Page 55: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

It is Sequenced What’s Next?It is Sequenced, What s Next?• Tracing Phylogenyg y g y

• Finding family relationships between species by tracking similarities between species.g p

• Gene Annotation (cooperative genomics)• Comparison of similar speciesComparison of similar species.

• Determining Regulatory Networks• The variables that determine how the body reacts• The variables that determine how the body reacts

to certain stimuli.Proteomics• Proteomics• From DNA sequence to a folded protein.

Page 56: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

ModelingModeling• Modeling biological processes tells us if we

understand a given process• Because of the large number of variables that

exist in biological problems powerfulexist in biological problems, powerful computers are needed to analyze certain biological questionsbiological questions

Page 57: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Protein ModelingProtein ModelingQ ant m chemistr imaging algorithms of acti e• Quantum chemistry imaging algorithms of active sites allow us to view possible bonding and reaction mechanisms

• Homologous protein modeling is a comparative proteomic approach to determining an unknown protein’s tertiary structureprotein s tertiary structure

• Predictive tertiary folding algorithms are a long way off, but we can predict secondary structure with , p y~80% accuracy.

The most accurate online prediction tools: PSIPredPSIPredPHD

Page 58: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Regulatory Network ModelingRegulatory Network Modeling • Micro array experiments allow us to compare

differences in expression for two different states

• Algorithms for clustering groups of geneAlgorithms for clustering groups of gene expression help point out possible regulatory networksnetworks

• Other algorithms perform statistical analysis t i i l t i t tto improve signal to noise contrast

Page 59: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Systems Biology ModelingSystems Biology Modeling• Predictions of whole cell interactions.

• Organelle processes, expression modelingg g

• Currently feasible for specific processes (eg• Currently feasible for specific processes (eg. Metabolism in E. coli, simple cells)

Flux Balance Analysis

Page 60: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

The futureThe future…• Bioinformatics is still in it’s infancy• Much is still to be learned about how proteins p

can manipulate a sequence of base pairs in such a peculiar way that results in a fullysuch a peculiar way that results in a fully functional organism.Ho can e then se this information to• How can we then use this information to benefit humanity without abusing it?

Page 61: MolecularBiologyPrimerMolecular Biology Primer - unibo.it Neurali Teoria e Applicazioni... · MolecularBiologyPrimerMolecular Biology Primer Angela Brooks, Raymond Brown, Calvin Chen,

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info

Sources CitedSources Cited• Daniel Sam “Greedy Algorithm” presentation• Daniel Sam, Greedy Algorithm presentation.• Glenn Tesler, “Genome Rearrangements in Mammalian Evolution:

Lessons from Human and Mouse Genomes” presentation.• Ernst Mayr, “What evolution is”.Ernst Mayr, What evolution is .• Neil C. Jones, Pavel A. Pevzner, “An Introduction to Bioinformatics

Algorithms”.• Alberts, Bruce, Alexander Johnson, Julian Lewis, Martin Raff, Keith

R b t P t W lt M l l Bi l f th C ll N Y k G l dRoberts, Peter Walter. Molecular Biology of the Cell. New York: Garland Science. 2002.

• Mount, Ellis, Barbara A. List. Milestones in Science & Technology. Phoenix: The Oryx Press. 1994.y

• Voet, Donald, Judith Voet, Charlotte Pratt. Fundamentals of Biochemistry. New Jersey: John Wiley & Sons, Inc. 2002.

• Campbell, Neil. Biology, Third Edition. The Benjamin/Cummings Publishing C I 1993Company, Inc., 1993.

• Snustad, Peter and Simmons, Michael. Principles of Genetics. John Wiley & Sons, Inc, 2003.