Upload
narveer-singh
View
216
Download
0
Embed Size (px)
Citation preview
8/2/2019 Genome Org1
1/46
Genome Organization &Protein Synthesis and
Processing in Plants
8/2/2019 Genome Org1
2/46
Viral genomes
Viral genomes: ssRNA, dsRNA, ssDNA, dsDNA, linear orciruclar
Viruses with RNA genomes:Almost all plant viruses and some bacterial and animal virusesGenomes are rather small (a few thousand nucleotides)Viruses with DNA genomes (e.g.lambda = 48,502 bp):Often a circular genome.
Replicative form of viral genomesall ssRNA viruses produce dsRNA moleculesmany linear DNA molecules become circularMolecular weight and contour length: duplex length per nucleotide = 3.4 Mol. Weight per base pair = ~ 660
8/2/2019 Genome Org1
3/46
Procaryotic genomes
Generally 1 circular chromosome (dsDNA) Usually without introns Relatively high gene density (~2500 genes
per mm ofE. coliDNA) Contour length ofE.coligenome: 1.7 mm
Often indigenous plasmids are present
8/2/2019 Genome Org1
4/46
PlasmidsExtra chromosomal circular DNAs Found in bacteria, yeast and other fungi Size varies form ~ 3,000 bp to 100,000 bp. Replicate autonomously (origin of replication) May contain resistance genes May be transferred from one bacterium to another May be transferred across kingdoms Multicopy plasmids (~ up to 400 plasmids/per cell) Low copy plasmids (1 2 copies per cell) Plasmids may be incompatible with each other Are used as vectors that could carry a foreign gene of
interest (e.g. insulin)
-lactamaseori
foreign gene
8/2/2019 Genome Org1
5/46
Eukaryotic genome
Moderately repetitive
Functional (protein coding, tRNA coding)
Unknown function
SINEs (short interspersed elements)
200-300 bp
100,000 copies
LINEs (long interspersed elements)
1-5 kb
10-10,000 copies
8/2/2019 Genome Org1
6/46
Eukaryotic genome
Highly repetitive
Minisatellites Repeats of 14-500 bp
1-5 kb long
Scattered throughout genome
Microsatellites
Repeats up to 13 bp
100s of kb long, 106
copies Around centromere
Telomeres
Short repeats (6 bp)
250-1,000 at ends of chromosomes
8/2/2019 Genome Org1
7/46
Eucaryotic genomes
Located on several chromosomes Relatively low gene density (50 genes per mm of
DNA in humans) Contour length of DNA from a single human cell = 2
meters Approximately 1011 cells = total length 2 x 1011 km Distance between sun and earth (1.5 x 108 km)
Human chromosomes vary in length over a 25 foldrange Carry organelles genome as well
8/2/2019 Genome Org1
8/46
Mitochondrial genome (mtDNA)
Multiple identical circular chromosomes Size ~15 Kb in animals Size ~ 200 kb to 2,500 kb in plants Over 95% of mitochondrial proteins are
encoded in the nuclear genome.
Often A+T rich genomes. Mt DNA is replicated before or during
mitosis
8/2/2019 Genome Org1
9/46
Chloroplast genome
(cpDNA) Multiple circular molecules Size ranges from 120 kb to 160 kb Similar to mtDNA Many chloroplast proteins are encoded
in the nucleus (separate signalsequence)
8/2/2019 Genome Org1
10/46
Cellular GenomesViruses Procaryotes Eucaryotes
Viral genome Bacterialchromosome
Plasmids
Chromosomes(Nuclear genome)
Mitochondrialgenome
Chloroplastgenome
Genome:all of an organisms genes plus intergenicDNA
Intergenic DNA = DNA between genes
Capsid
Nucleus
8/2/2019 Genome Org1
11/46
Estimated genome sizes
1e1 1e2 1e3 1e4 1e5 1e6 1e7 1e8 1e9 1e10 1e11 1e12
viruses (1024)
bacteria (>100)
fungi
mitochondria (~ 100)
plants
mammals
Size in nucleotides. Number in ( ) = completely sequenced genomes
8/2/2019 Genome Org1
12/46
Size of genomes
Epstein-Barr virus 0.172 x 106
E. coli 4.6 x 106
S. cerevisiae 12.1 x 106
C. elegans 95.5 x 106
A. thaliana 117 x 106
D. melanogaster 180 x 106
H. sapiens 3200 x 106
8/2/2019 Genome Org1
13/46
Chromosome organization
Eucaryotic chromosome
Telomere TelomereCentromere
Centromere: DNA sequence that serve as an attachment for protein during mitosis. In yeast these sequences (~ 130 nts) are very A+T rich. In higher eucaryotes centromers are much longer and contain
satellite DNA
Telomeres: At the end of chromosomes; help stabilize the chromosome In yeast telomeres are ~ 100 bp long (imperfect repeats) Repeats are added by a specific telomerase
p-arm q-arm
5 (TxGy)n
3 (AxCy)n
x and y = 1 - 4
n = 20 to 100; (1500 in mammals)
8/2/2019 Genome Org1
14/46
Gene classification
coding genesnon-codinggenes
Messenger RNA
Proteins
Structural RNA
Structural proteins Enzymes
transferRNA
ribosomalRNA
otherRNA
Chromosome
(simplified)
intergenicregion
8/2/2019 Genome Org1
15/46
What is a gene ? Definitions
1. Classical definition: Portion of a DNA that determines a
single character (phenotype)
2. One geneone enzyme(Beadle & Tatum 1940): Every
gene encodes the information for one enzyme
3. One geneone protein:One gene contains information
for one protein (structural proteins included) one gene
one polypeptide
4. Current definition: A piece of DNA (or in some casesRNA) that contains the primary sequence to produce a
functional biological gene product (RNA, protein).
8/2/2019 Genome Org1
16/46
Coding region
Nucleotides (open reading frame) encoding
the amino acid sequence of a protein
The molecular definition of gene includes
more than just the coding region
8/2/2019 Genome Org1
17/46
Noncoding regions
Regulatory regions
RNA polymerase binding site
Transcription factor binding sites
Introns
Polyadenylation [poly(A)] sites
8/2/2019 Genome Org1
18/46
Gene
Molecular definition:
Entire nucleic acid sequence necessary for the
synthesis of a functional polypeptide(protein chain) or functional RNA
8/2/2019 Genome Org1
19/46
Anatomy of a gene
ORF. From start (ATG) to stop (TGA,
TAA, TAG)
Upstream region with binding site. (e.g.TATA box).
Poly-a tail
Splices. Bounded by AG and GT splice
signals.
8/2/2019 Genome Org1
20/46
Bacterial genes
Most do not have introns
Many are organized in operons: contiguous
genes, transcribed as a single polycistronicmRNA, that encode proteins with related
functions
Polycistronic mRNA encodes several proteins
8/2/2019 Genome Org1
21/46
What would be the effect of a mutation in
the control region (a) compared to a
mutation in a structural gene (b)?
Bacterial operon
8/2/2019 Genome Org1
22/46
Eucaryotic genes
Exon 190 bp
Exon 2222 bp
Exon 3126 bp
Intron A131 bp
Intron B851 bp
Hemoglobin beta subunit gene
Introns: intervening sequences within a gene that are not translatedinto a protein sequence. Collagen has 50 introns.
Exons: sequences within a gene that encode protein sequences
Splicing: Removal of introns from the mRNA molecule.
Splicing
8/2/2019 Genome Org1
23/46
Regulatory mechanisms
organize expression of genes (functioncalls)
Promoter region (binding site), usually nearcoding region
Binding can block (inhibit) expression
Computational challengesIdentify binding sites
Correlate sequence to expression
8/2/2019 Genome Org1
24/46
Eukaryotic genes
Most have introns
Produce monocistronic mRNA: only one
encoded protein
Large
8/2/2019 Genome Org1
25/46
Alternative splicing
Splicing is the removal of introns
mRNA from some genes can be spliced into
two or more different mRNAs
8/2/2019 Genome Org1
26/46
Nonfunctional DNA
Higher eukaryotes have a lot of noncodingDNA
Some has no known structural or regulatoryfunction (no genes)
80 kb
8/2/2019 Genome Org1
27/46
Types of eukaryotic DNA
8/2/2019 Genome Org1
28/46
Duplicated genes
Encode closely related (homologous)proteins
Clustered together in genome
Formed by duplication of an ancestral genefollowed by mutation
Five functional genes and two pseudogenes
8/2/2019 Genome Org1
29/46
Pseudogenes
Nonfunctional copies of genes
Formed by duplication of ancestral gene, or
reverse transcription (and integration)
Not expressed due to mutations that
produce a stop codon (nonsense or
frameshift) or prevent mRNA processing, ordue to lack of regulatory sequences
8/2/2019 Genome Org1
30/46
Repetitive DNA
Moderately repeated DNATandemly repeated rRNA, tRNA and histone
genes (gene products needed in high amounts)
Large duplicated gene familiesMobile DNA
Simple-sequence DNA
Tandemly repeated short sequencesFound in centromeres and telomeres (and others)
Used in DNA fingerprinting to identify
individuals
8/2/2019 Genome Org1
31/46
Types of DNA repeats
Tandem repeats (e.g. satellite DNA)
Inverted repeats (e.g. in transposons)
5-CATGTGCTGAAGGCTATGTGCTGCGACG- 3
3-GTACACGACTTCCGATACACGACGCTGC- 5
5-CATGTGCTGAAGGCTCAGCACATCGACG- 3
3-GTACACGACTTCCGAGTCGTGTAGCTGC- 5 Stem
Loop
Palindroms = adjacent inverted repeats(e.g. restriction sites) Form hairpin structures
Form stem-loop structures
Hairpin
Perfect repeats vs degenerate repeats
8/2/2019 Genome Org1
32/46
Repetitive sequences
Chromosomal DNASatellite DNA
Caesium chloridedensity gradient
Type No. of
Repeats
Size Percent of
genome
Highlyrepetitive
> 1 Mill < 10 bp 10 %
Moderately
repetitive
> 1000 ~ 150 - ~300 bp 20 %
Repeats in the mouse genome
8/2/2019 Genome Org1
33/46
DNA repeats and forensics
878 bp
556 bp
M F Suspect
Alu sequenceY
X
M F Suspect
528 bp
199 bp
X-Y homologous regionsAluSTYa
AluSTXa
AluSTYa
Gender determination1) Standard technique: PCR amplification
of the amelogenin locus(Males = XY => 103 + 109 bp)
2) AluSTXa Alu insertion on X
3) AluSTYa Alu insertion on Y
8/2/2019 Genome Org1
34/46
Mobile DNA
Move within genomes
Most of moderately repeated DNA sequences
found throughout higher eukaryotic genomes
L1 LINE is ~5% of human DNA (~50,000 copies)
Alu is ~5% of human DNA (>500,000 copies)
Some encode enzymes that catalyzemovement
8/2/2019 Genome Org1
35/46
Transposition
Movement of mobile DNA
Involves copying of mobile DNA element
and insertion into new site in genome
8/2/2019 Genome Org1
36/46
Why?
Molecular parasite: selfish DNA
Probably have significant effect on
evolution by facilitating gene duplication,which provides the fuel for evolution, and
exon shuffling
8/2/2019 Genome Org1
37/46
RNA or DNA intermediate
Transposon moves
using DNA
intermediate Retrotransposon
moves using RNA
intermediate
8/2/2019 Genome Org1
38/46
Types of mobile DNA elements
8/2/2019 Genome Org1
39/46
LTR (long terminal repeat)
Flank viral retrotransposons and retroviruses Contain regulatory sequences
Transcription start site and poly (A) site
8/2/2019 Genome Org1
40/46
8/2/2019 Genome Org1
41/46
8/2/2019 Genome Org1
42/46
Proteins
Most protein sequences (today) are inferred
Whats wrong with this?
Proteins (and nucleic acids) are modified mature Rna
Computational challenges
Identify (possible) aspects of molecular life cycle Identify protein-protein and protein-nucleic acid
interactions
8/2/2019 Genome Org1
43/46
Genetic variation
Variable number tandem repeats
(minisatellites). 10-100 bp. Forensic
applications. Short tandem repeat polymorphisms
(microsatellites). 2-5 bp, 10-30 consecutive
copies. Single nucleotide polymorphisms
8/2/2019 Genome Org1
44/46
Single nucleotide polymorphisms
1/2000 bp.
Types
Silent
Truncating
Shifting
Significance: much of individual variation.
Challenge: correlation to disease
8/2/2019 Genome Org1
45/46
Yeast genome
4.6 x 106 bp. One chromosome. Published
1997.
4,285 protein-coding genes
122 structural RNA genes
Repeats. Regulatory elements. Transposons.
Lateral transfers.
8/2/2019 Genome Org1
46/46
Yeast protein functions
Regulatory 45 1.05%Cell structure 182 4.24
Transposons,etc 87 2.03
Transport & binding 281 6.55Putative transport 146 3.40
Replication, repair 115 2.68
Transcription 55 1.28
Translation 182 4.24
Enzymes 251 5.85
Unknown 1632 38.06