Genome Org1

Embed Size (px)

Citation preview

  • 8/2/2019 Genome Org1

    1/46

    Genome Organization &Protein Synthesis and

    Processing in Plants

  • 8/2/2019 Genome Org1

    2/46

    Viral genomes

    Viral genomes: ssRNA, dsRNA, ssDNA, dsDNA, linear orciruclar

    Viruses with RNA genomes:Almost all plant viruses and some bacterial and animal virusesGenomes are rather small (a few thousand nucleotides)Viruses with DNA genomes (e.g.lambda = 48,502 bp):Often a circular genome.

    Replicative form of viral genomesall ssRNA viruses produce dsRNA moleculesmany linear DNA molecules become circularMolecular weight and contour length: duplex length per nucleotide = 3.4 Mol. Weight per base pair = ~ 660

  • 8/2/2019 Genome Org1

    3/46

    Procaryotic genomes

    Generally 1 circular chromosome (dsDNA) Usually without introns Relatively high gene density (~2500 genes

    per mm ofE. coliDNA) Contour length ofE.coligenome: 1.7 mm

    Often indigenous plasmids are present

  • 8/2/2019 Genome Org1

    4/46

    PlasmidsExtra chromosomal circular DNAs Found in bacteria, yeast and other fungi Size varies form ~ 3,000 bp to 100,000 bp. Replicate autonomously (origin of replication) May contain resistance genes May be transferred from one bacterium to another May be transferred across kingdoms Multicopy plasmids (~ up to 400 plasmids/per cell) Low copy plasmids (1 2 copies per cell) Plasmids may be incompatible with each other Are used as vectors that could carry a foreign gene of

    interest (e.g. insulin)

    -lactamaseori

    foreign gene

  • 8/2/2019 Genome Org1

    5/46

    Eukaryotic genome

    Moderately repetitive

    Functional (protein coding, tRNA coding)

    Unknown function

    SINEs (short interspersed elements)

    200-300 bp

    100,000 copies

    LINEs (long interspersed elements)

    1-5 kb

    10-10,000 copies

  • 8/2/2019 Genome Org1

    6/46

    Eukaryotic genome

    Highly repetitive

    Minisatellites Repeats of 14-500 bp

    1-5 kb long

    Scattered throughout genome

    Microsatellites

    Repeats up to 13 bp

    100s of kb long, 106

    copies Around centromere

    Telomeres

    Short repeats (6 bp)

    250-1,000 at ends of chromosomes

  • 8/2/2019 Genome Org1

    7/46

    Eucaryotic genomes

    Located on several chromosomes Relatively low gene density (50 genes per mm of

    DNA in humans) Contour length of DNA from a single human cell = 2

    meters Approximately 1011 cells = total length 2 x 1011 km Distance between sun and earth (1.5 x 108 km)

    Human chromosomes vary in length over a 25 foldrange Carry organelles genome as well

  • 8/2/2019 Genome Org1

    8/46

    Mitochondrial genome (mtDNA)

    Multiple identical circular chromosomes Size ~15 Kb in animals Size ~ 200 kb to 2,500 kb in plants Over 95% of mitochondrial proteins are

    encoded in the nuclear genome.

    Often A+T rich genomes. Mt DNA is replicated before or during

    mitosis

  • 8/2/2019 Genome Org1

    9/46

    Chloroplast genome

    (cpDNA) Multiple circular molecules Size ranges from 120 kb to 160 kb Similar to mtDNA Many chloroplast proteins are encoded

    in the nucleus (separate signalsequence)

  • 8/2/2019 Genome Org1

    10/46

    Cellular GenomesViruses Procaryotes Eucaryotes

    Viral genome Bacterialchromosome

    Plasmids

    Chromosomes(Nuclear genome)

    Mitochondrialgenome

    Chloroplastgenome

    Genome:all of an organisms genes plus intergenicDNA

    Intergenic DNA = DNA between genes

    Capsid

    Nucleus

  • 8/2/2019 Genome Org1

    11/46

    Estimated genome sizes

    1e1 1e2 1e3 1e4 1e5 1e6 1e7 1e8 1e9 1e10 1e11 1e12

    viruses (1024)

    bacteria (>100)

    fungi

    mitochondria (~ 100)

    plants

    mammals

    Size in nucleotides. Number in ( ) = completely sequenced genomes

  • 8/2/2019 Genome Org1

    12/46

    Size of genomes

    Epstein-Barr virus 0.172 x 106

    E. coli 4.6 x 106

    S. cerevisiae 12.1 x 106

    C. elegans 95.5 x 106

    A. thaliana 117 x 106

    D. melanogaster 180 x 106

    H. sapiens 3200 x 106

  • 8/2/2019 Genome Org1

    13/46

    Chromosome organization

    Eucaryotic chromosome

    Telomere TelomereCentromere

    Centromere: DNA sequence that serve as an attachment for protein during mitosis. In yeast these sequences (~ 130 nts) are very A+T rich. In higher eucaryotes centromers are much longer and contain

    satellite DNA

    Telomeres: At the end of chromosomes; help stabilize the chromosome In yeast telomeres are ~ 100 bp long (imperfect repeats) Repeats are added by a specific telomerase

    p-arm q-arm

    5 (TxGy)n

    3 (AxCy)n

    x and y = 1 - 4

    n = 20 to 100; (1500 in mammals)

  • 8/2/2019 Genome Org1

    14/46

    Gene classification

    coding genesnon-codinggenes

    Messenger RNA

    Proteins

    Structural RNA

    Structural proteins Enzymes

    transferRNA

    ribosomalRNA

    otherRNA

    Chromosome

    (simplified)

    intergenicregion

  • 8/2/2019 Genome Org1

    15/46

    What is a gene ? Definitions

    1. Classical definition: Portion of a DNA that determines a

    single character (phenotype)

    2. One geneone enzyme(Beadle & Tatum 1940): Every

    gene encodes the information for one enzyme

    3. One geneone protein:One gene contains information

    for one protein (structural proteins included) one gene

    one polypeptide

    4. Current definition: A piece of DNA (or in some casesRNA) that contains the primary sequence to produce a

    functional biological gene product (RNA, protein).

  • 8/2/2019 Genome Org1

    16/46

    Coding region

    Nucleotides (open reading frame) encoding

    the amino acid sequence of a protein

    The molecular definition of gene includes

    more than just the coding region

  • 8/2/2019 Genome Org1

    17/46

    Noncoding regions

    Regulatory regions

    RNA polymerase binding site

    Transcription factor binding sites

    Introns

    Polyadenylation [poly(A)] sites

  • 8/2/2019 Genome Org1

    18/46

    Gene

    Molecular definition:

    Entire nucleic acid sequence necessary for the

    synthesis of a functional polypeptide(protein chain) or functional RNA

  • 8/2/2019 Genome Org1

    19/46

    Anatomy of a gene

    ORF. From start (ATG) to stop (TGA,

    TAA, TAG)

    Upstream region with binding site. (e.g.TATA box).

    Poly-a tail

    Splices. Bounded by AG and GT splice

    signals.

  • 8/2/2019 Genome Org1

    20/46

    Bacterial genes

    Most do not have introns

    Many are organized in operons: contiguous

    genes, transcribed as a single polycistronicmRNA, that encode proteins with related

    functions

    Polycistronic mRNA encodes several proteins

  • 8/2/2019 Genome Org1

    21/46

    What would be the effect of a mutation in

    the control region (a) compared to a

    mutation in a structural gene (b)?

    Bacterial operon

  • 8/2/2019 Genome Org1

    22/46

    Eucaryotic genes

    Exon 190 bp

    Exon 2222 bp

    Exon 3126 bp

    Intron A131 bp

    Intron B851 bp

    Hemoglobin beta subunit gene

    Introns: intervening sequences within a gene that are not translatedinto a protein sequence. Collagen has 50 introns.

    Exons: sequences within a gene that encode protein sequences

    Splicing: Removal of introns from the mRNA molecule.

    Splicing

  • 8/2/2019 Genome Org1

    23/46

    Regulatory mechanisms

    organize expression of genes (functioncalls)

    Promoter region (binding site), usually nearcoding region

    Binding can block (inhibit) expression

    Computational challengesIdentify binding sites

    Correlate sequence to expression

  • 8/2/2019 Genome Org1

    24/46

    Eukaryotic genes

    Most have introns

    Produce monocistronic mRNA: only one

    encoded protein

    Large

  • 8/2/2019 Genome Org1

    25/46

    Alternative splicing

    Splicing is the removal of introns

    mRNA from some genes can be spliced into

    two or more different mRNAs

  • 8/2/2019 Genome Org1

    26/46

    Nonfunctional DNA

    Higher eukaryotes have a lot of noncodingDNA

    Some has no known structural or regulatoryfunction (no genes)

    80 kb

  • 8/2/2019 Genome Org1

    27/46

    Types of eukaryotic DNA

  • 8/2/2019 Genome Org1

    28/46

    Duplicated genes

    Encode closely related (homologous)proteins

    Clustered together in genome

    Formed by duplication of an ancestral genefollowed by mutation

    Five functional genes and two pseudogenes

  • 8/2/2019 Genome Org1

    29/46

    Pseudogenes

    Nonfunctional copies of genes

    Formed by duplication of ancestral gene, or

    reverse transcription (and integration)

    Not expressed due to mutations that

    produce a stop codon (nonsense or

    frameshift) or prevent mRNA processing, ordue to lack of regulatory sequences

  • 8/2/2019 Genome Org1

    30/46

    Repetitive DNA

    Moderately repeated DNATandemly repeated rRNA, tRNA and histone

    genes (gene products needed in high amounts)

    Large duplicated gene familiesMobile DNA

    Simple-sequence DNA

    Tandemly repeated short sequencesFound in centromeres and telomeres (and others)

    Used in DNA fingerprinting to identify

    individuals

  • 8/2/2019 Genome Org1

    31/46

    Types of DNA repeats

    Tandem repeats (e.g. satellite DNA)

    Inverted repeats (e.g. in transposons)

    5-CATGTGCTGAAGGCTATGTGCTGCGACG- 3

    3-GTACACGACTTCCGATACACGACGCTGC- 5

    5-CATGTGCTGAAGGCTCAGCACATCGACG- 3

    3-GTACACGACTTCCGAGTCGTGTAGCTGC- 5 Stem

    Loop

    Palindroms = adjacent inverted repeats(e.g. restriction sites) Form hairpin structures

    Form stem-loop structures

    Hairpin

    Perfect repeats vs degenerate repeats

  • 8/2/2019 Genome Org1

    32/46

    Repetitive sequences

    Chromosomal DNASatellite DNA

    Caesium chloridedensity gradient

    Type No. of

    Repeats

    Size Percent of

    genome

    Highlyrepetitive

    > 1 Mill < 10 bp 10 %

    Moderately

    repetitive

    > 1000 ~ 150 - ~300 bp 20 %

    Repeats in the mouse genome

  • 8/2/2019 Genome Org1

    33/46

    DNA repeats and forensics

    878 bp

    556 bp

    M F Suspect

    Alu sequenceY

    X

    M F Suspect

    528 bp

    199 bp

    X-Y homologous regionsAluSTYa

    AluSTXa

    AluSTYa

    Gender determination1) Standard technique: PCR amplification

    of the amelogenin locus(Males = XY => 103 + 109 bp)

    2) AluSTXa Alu insertion on X

    3) AluSTYa Alu insertion on Y

  • 8/2/2019 Genome Org1

    34/46

    Mobile DNA

    Move within genomes

    Most of moderately repeated DNA sequences

    found throughout higher eukaryotic genomes

    L1 LINE is ~5% of human DNA (~50,000 copies)

    Alu is ~5% of human DNA (>500,000 copies)

    Some encode enzymes that catalyzemovement

  • 8/2/2019 Genome Org1

    35/46

    Transposition

    Movement of mobile DNA

    Involves copying of mobile DNA element

    and insertion into new site in genome

  • 8/2/2019 Genome Org1

    36/46

    Why?

    Molecular parasite: selfish DNA

    Probably have significant effect on

    evolution by facilitating gene duplication,which provides the fuel for evolution, and

    exon shuffling

  • 8/2/2019 Genome Org1

    37/46

    RNA or DNA intermediate

    Transposon moves

    using DNA

    intermediate Retrotransposon

    moves using RNA

    intermediate

  • 8/2/2019 Genome Org1

    38/46

    Types of mobile DNA elements

  • 8/2/2019 Genome Org1

    39/46

    LTR (long terminal repeat)

    Flank viral retrotransposons and retroviruses Contain regulatory sequences

    Transcription start site and poly (A) site

  • 8/2/2019 Genome Org1

    40/46

  • 8/2/2019 Genome Org1

    41/46

  • 8/2/2019 Genome Org1

    42/46

    Proteins

    Most protein sequences (today) are inferred

    Whats wrong with this?

    Proteins (and nucleic acids) are modified mature Rna

    Computational challenges

    Identify (possible) aspects of molecular life cycle Identify protein-protein and protein-nucleic acid

    interactions

  • 8/2/2019 Genome Org1

    43/46

    Genetic variation

    Variable number tandem repeats

    (minisatellites). 10-100 bp. Forensic

    applications. Short tandem repeat polymorphisms

    (microsatellites). 2-5 bp, 10-30 consecutive

    copies. Single nucleotide polymorphisms

  • 8/2/2019 Genome Org1

    44/46

    Single nucleotide polymorphisms

    1/2000 bp.

    Types

    Silent

    Truncating

    Shifting

    Significance: much of individual variation.

    Challenge: correlation to disease

  • 8/2/2019 Genome Org1

    45/46

    Yeast genome

    4.6 x 106 bp. One chromosome. Published

    1997.

    4,285 protein-coding genes

    122 structural RNA genes

    Repeats. Regulatory elements. Transposons.

    Lateral transfers.

  • 8/2/2019 Genome Org1

    46/46

    Yeast protein functions

    Regulatory 45 1.05%Cell structure 182 4.24

    Transposons,etc 87 2.03

    Transport & binding 281 6.55Putative transport 146 3.40

    Replication, repair 115 2.68

    Transcription 55 1.28

    Translation 182 4.24

    Enzymes 251 5.85

    Unknown 1632 38.06