13
1 Graduate Graduate Computational Computational Genomics Genomics 02-710 / 10-810 02-710 / 10-810 & MSCBIO2070 & MSCBIO2070 Elements of Molecular Elements of Molecular Biology Biology Takis Benos Takis Benos Lecture #6a, February 1, 2007 Lecture #6a, February 1, 2007 Reading: hand-outs Reading: hand-outs Benos 02-710/MSCBIO2070 1-FEB-2007 2 Sequence analysis (6 lectures) Sequence analysis (6 lectures) A little biology… …and statistics (conditional, Markov chains, HMMs) Biological sequence “matchmaking” Evolution of DNA and protein sequences - Distances Pairwise and multiple sequence analysis Algorithms for database search Gene finding DNA motif dicovery cis-regulatory motifs and modules microRNA genes

Graduate Computational Genomicsepxing/Class/10810-07/lectures/lecture06a_intro_… · Graduate Computational Genomics 02-710 / 10-810 & MSCBIO2070 Elements of Molecular Biology Takis

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Graduate Computational Genomicsepxing/Class/10810-07/lectures/lecture06a_intro_… · Graduate Computational Genomics 02-710 / 10-810 & MSCBIO2070 Elements of Molecular Biology Takis

1

GraduateGraduate ComputationalComputationalGenomicsGenomics

02-710 / 10-81002-710 / 10-810 & MSCBIO2070& MSCBIO2070

Elements of MolecularElements of Molecular BiologyBiology

Takis BenosTakis Benos

Lecture #6a, February 1, 2007Lecture #6a, February 1, 2007

Reading: hand-outsReading: hand-outs

Benos 02-710/MSCBIO2070 1-FEB-2007 2

Sequence analysis (6 lectures)Sequence analysis (6 lectures)

A little biology… …and statistics (conditional, Markov chains, HMMs)

Biological sequence “matchmaking” Evolution of DNA and protein sequences - Distances Pairwise and multiple sequence analysis Algorithms for database search

Gene finding DNA motif dicovery

cis-regulatory motifs and modules microRNA genes

Page 2: Graduate Computational Genomicsepxing/Class/10810-07/lectures/lecture06a_intro_… · Graduate Computational Genomics 02-710 / 10-810 & MSCBIO2070 Elements of Molecular Biology Takis

2

Benos 02-710/MSCBIO2070 1-FEB-2007 3

Sequence analysis (6 lectures)Sequence analysis (6 lectures)

What we will not talk in these lectures Genome sequencing assembly Clustering/classification (K-means, SVMs, etc) RNA folding

Benos 02-710/MSCBIO2070 1-FEB-2007 4

Outline of the biology partOutline of the biology part

Basic Definitions Cells’ basic components Basic characteristics of DNA & Proteins Transcription and Translation: Central Dogma Other Features of Genetic Sequence Molecular Evolution

Page 3: Graduate Computational Genomicsepxing/Class/10810-07/lectures/lecture06a_intro_… · Graduate Computational Genomics 02-710 / 10-810 & MSCBIO2070 Elements of Molecular Biology Takis

3

Benos 02-710/MSCBIO2070 1-FEB-2007 5

CellsCells’’ components components

Cells are complex We look at a simplified

version: Extracellular environment Membrane Cytoplasm Nucleus (in eukaryotes)

Benos 02-710/MSCBIO2070 1-FEB-2007 6

Chromosome

DNA - Chromosomes - GenesDNA - Chromosomes - Genesp arm

q armcentromere

Page 4: Graduate Computational Genomicsepxing/Class/10810-07/lectures/lecture06a_intro_… · Graduate Computational Genomics 02-710 / 10-810 & MSCBIO2070 Elements of Molecular Biology Takis

4

Benos 02-710/MSCBIO2070 1-FEB-2007 7

DNA - Chromosomes - GenesDNA - Chromosomes - Genes

5’ - A T C G G T - 3’| | | | | |

3’ - T A G C C A - 5’

5’ - A C C G A T - 3’

Benos 02-710/MSCBIO2070 1-FEB-2007 8

• We cannot define it (but we know it when we see it…)

What is a What is a ““genegene””??

“Gene” is a DNA information unit that is ableto perform a function in a cellular environment

• A loose definition:

Page 5: Graduate Computational Genomicsepxing/Class/10810-07/lectures/lecture06a_intro_… · Graduate Computational Genomics 02-710 / 10-810 & MSCBIO2070 Elements of Molecular Biology Takis

5

Benos 02-710/MSCBIO2070 1-FEB-2007 9

Central Dogma (and beyond)Central Dogma (and beyond)

Human gDNA, ~3x109 bpContains ~ 22,000 genes G

A

C

A

G

C

messenger-RNA

transcription translation folding

Slide courtesy: Serafim Batzoglou

Benos 02-710/MSCBIO2070 1-FEB-2007 10

Protein coding genesProtein coding genes

5’

3’

Page 6: Graduate Computational Genomicsepxing/Class/10810-07/lectures/lecture06a_intro_… · Graduate Computational Genomics 02-710 / 10-810 & MSCBIO2070 Elements of Molecular Biology Takis

6

Benos 02-710/MSCBIO2070 1-FEB-2007 11

Genes and ProteinsGenes and Proteins amino acids proteins the genetic codechart of code

Benos 02-710/MSCBIO2070 1-FEB-2007 12

Amino acid varietiesAmino acid varieties

Slide courtesy: Serafim Batzoglou

Page 7: Graduate Computational Genomicsepxing/Class/10810-07/lectures/lecture06a_intro_… · Graduate Computational Genomics 02-710 / 10-810 & MSCBIO2070 Elements of Molecular Biology Takis

7

Benos 02-710/MSCBIO2070 1-FEB-2007 13

Amino acid varieties (Amino acid varieties (cntdcntd))

Slide courtesy: Serafim Batzoglou

Benos 02-710/MSCBIO2070 1-FEB-2007 14

Categories of living organismsCategories of living organisms

Source: http://www.bmb.psu.edu/courses/micro401/default.htm

EukaryotesEukaryotesProkaryotesProkaryotes

2-4 billion yrs

Page 8: Graduate Computational Genomicsepxing/Class/10810-07/lectures/lecture06a_intro_… · Graduate Computational Genomics 02-710 / 10-810 & MSCBIO2070 Elements of Molecular Biology Takis

8

Benos 02-710/MSCBIO2070 1-FEB-2007 15

Cell membranes,nucleus/organelles

Cell walls, nonucleus/organellesCell

ComplexSimpleTranscription

ComplexSimple (no introns,short UTRs)Gene

Monocistronic(mostly)PolycistronicmRNA

EukaryotesProkaryotes

Prokaryotes Prokaryotes vsvs.. Eukaryotes Eukaryotes

Benos 02-710/MSCBIO2070 1-FEB-2007 16

Genome logistics:Genome logistics:viruses and prokaryotesviruses and prokaryotes

Organism Size (bp x 106) No. of prot. genes

HIV-1 0.1 8

phage 0.05 71

E. coli 4.7 3,200

H. influenza 1.8 1,700

Page 9: Graduate Computational Genomicsepxing/Class/10810-07/lectures/lecture06a_intro_… · Graduate Computational Genomics 02-710 / 10-810 & MSCBIO2070 Elements of Molecular Biology Takis

9

Benos 02-710/MSCBIO2070 1-FEB-2007 17

Genome logistics:Genome logistics:eukaryoteseukaryotes

Organism Size (bp x 106) No. of prot. genes

S. cerevisiae 12 6,300

C. elegans 97 19,100

A. thaliana 125 25,500

D. melanogaster 180 13,600

H. sapiens 2,900 25,000

Benos 02-710/MSCBIO2070 1-FEB-2007 18

Gene structure: prokaryotesGene structure: prokaryotes

ayz-10 box-35 box

17 bp

mRNA

proteins

Page 10: Graduate Computational Genomicsepxing/Class/10810-07/lectures/lecture06a_intro_… · Graduate Computational Genomics 02-710 / 10-810 & MSCBIO2070 Elements of Molecular Biology Takis

10

Benos 02-710/MSCBIO2070 1-FEB-2007 19

Gene structure: eukaryotesGene structure: eukaryotes

-100 -30TATACAAT

“core” promoter exon-1 exon-2 exon-3intron-1 intron-2

Benos 02-710/MSCBIO2070 1-FEB-2007 20

RNA SplicingRNA Splicing

-100 -30

TATACAAT

5’ UTR 3’ UTRCDS

mRNA

Page 11: Graduate Computational Genomicsepxing/Class/10810-07/lectures/lecture06a_intro_… · Graduate Computational Genomics 02-710 / 10-810 & MSCBIO2070 Elements of Molecular Biology Takis

11

Benos 02-710/MSCBIO2070 1-FEB-2007 21

TranslationTranslation

-100 -30

TATACAAT

mRNA

protein

ATG TAA

CAP AAAAAAAAAAA

Benos 02-710/MSCBIO2070 1-FEB-2007 22

Alternative splicingAlternative splicing

-100 -30

TATACAAT

mRNA

mRNA(transcript-2)

Page 12: Graduate Computational Genomicsepxing/Class/10810-07/lectures/lecture06a_intro_… · Graduate Computational Genomics 02-710 / 10-810 & MSCBIO2070 Elements of Molecular Biology Takis

12

Benos 02-710/MSCBIO2070 1-FEB-2007 23

promoter region expression levels degradation post modifications

Transcription regulationTranscription regulation

Benos 02-710/MSCBIO2070 1-FEB-2007 24

promoter region expression levels degradation post modifications

Transcription regulationTranscription regulation

Page 13: Graduate Computational Genomicsepxing/Class/10810-07/lectures/lecture06a_intro_… · Graduate Computational Genomics 02-710 / 10-810 & MSCBIO2070 Elements of Molecular Biology Takis

13

Benos 02-710/MSCBIO2070 1-FEB-2007 25

tRNA(*)

ribosomal RNA(*)

snoRNA(*)

microRNA etc

Non-coding genesNon-coding genes

Source: http://www.emc.maricopa.edu/faculty/farabee/BIOBK/BioBookPROTSYn.html

(*)Not to be discussed in this course.

Benos 02-710/MSCBIO2070 1-FEB-2007 26

transposable elements(*)

repetitive DNA(*)

“junk” DNA(*)

Other DNA elementsOther DNA elements

(*)Not to be discussed in this course.