Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Molecular Biology of the Genome
Christine Queitsch
Department of Genome Sciences
1
• Information Flow in Genomics
• Gene Structure
• Genetic Linkage
• Mutations
• Chromatin Structure
Outline
2
DNA and the Flow of Information
The genetic material: DNA- Four kinds of subunits (bases A, C, G, T)
Ile
Gly
Ala
Arg
Lys
Val
Leu
Ile
ProSer
Thr
Cys
Tyr
Asn
Glu
Gln
ArgPhe
Val
Asn
Gln
His
Leu
Cys
Gly
Ser
HisLeu Val
Glu
Ala
Leu
Leu
Tyr
Val
Cys
GlyPhe
Phe
Tyr
Arg
Arg
Ala
Pro
Gln
Glu
Ala
Ala
Gly
Glu
Gly
Gly
Gly
Gly
Gly
Leu
Leu
Gln
Ala
LeuAla
Leu
Pro
Gly
Glu
Pro
Gln
Lys
Val
Gly
Cys
Gln
Glu
Thr
Cys
Ser
LeuGln
Leu
Glu
Asn
Asn
Tyr
Cys
H3N+
COO-
Activities within the cell performed by proteins- Twenty kinds of subunits (amino acids)
A coding problem
AC G
T
3
The “Central Dogma” of Molecular Biology
Information into protein flows one wayA universal code: 3 nucleotides = 1 amino acid
DNA RNA Protein
phenotype
transcription translationreplication
heredity
4
DNA Structure
• Information content is in the sequence of bases along a DNA molecule
rules of base pairing each strand of the double helix has all the info needed to recreate the other strand
• Genetic variation — differences in the base sequence between different individuals
• Redundancy in the code
multiple ways that DNA can specify a single amino acid
why individuals vary in their phenotypes
5
Central Dogma: DNA Replication
DNA structure: polarity and base pairing
5’ 3’3’ 5’
Watson
CrickA pairs with T G pairs with C
DNA replication: what’s the point?
duplicate the entire genome prior to cell division
new subunits can only be added to the 3’OH of the growing chain
63’
3’
5’
5’5’
3’
leading strand
lagging strand
3’
Central Dogma: Transcription
Genes — specific segments along the chromosomal DNA that code for some function
promoter
mRNA
mRNA
promoter
terminator
Transcription: “copy” gene into RNA (to make a specific protein)
genegene
gene
terminator
7
Transcription
Transcription: “copy” gene into RNA to make a specific protein
5’ 3’3’ 5’
w
c
gene coding or sense strand
template strand
Where’s the 5’ end of the gene? of the mRNA?
Which way is RNA polymerase moving?
mRNARNA polymerase
ribonucleic acid… uses uracil (U) in place of thymine (T)
8
Practice Question
1. Which way (to the right or left) are RNA polymerases moving?
2. Which strand (W or C) is the template strand?
5’ 3’3’ 5’
w
c
gene
10
Processing of pre-mRNA
Eukaryotic genes are interrupted by introns (non-coding information). They must be removed from the RNA before translation in a process called “splicing.”
mature mRNAintrons discardedexons spliced
together
exons introns
ORF
gene
UTR’s(untranslated regions)
pre-mRNA
11
Review of the Central Dogma: Translation
Translating the nucleic acid code to a peptide code…
Possible coding systems:
1 base per amino acid
Could only code for 4 amino acids!
2 bases per amino acid
Could only code for 16 amino acids
3 bases per amino acid
64 possible combinations… that’s plenty!
12
Met PheThrV alSerThr
AUGACUU U U UA AAA
AAC CC CG
NH3+ COO-
5’ 3’mRNA
protein
The triplet code
3 bases = 1 amino acidMore than 1 triplet can code for the same amino acid
Translation: reads the information in RNA to order the amino acids in a protein
codon
13
Punctuation:
Met PheThrV alSerThr
AUGACUU U U UA AAA
AAC CC CG
NH3+ COO-
5’ 3’mRNA
proteinSTOP
Start: AUG = methionine, the first amino acid in (almost) all proteins
Stop: UAA, UAG, and UGA.
NOT an amino acid!14
The Genetic Code: Who is the interpreter? Where’s the dictionary? What are the rules of grammar?
aminoacyl tRNA synthetase
amino acid
tRNA
charged tRNA
UAC UAC
MetMet
tRNA = transfer RNA
3’
anticodon
| | |AUG 3’5’
recognizes codon in mRNA
5’3’
15
5’ 3’
The ribosome: mediates translation
…AUAUGACUUCAGUAACCAUCUAACA…
After the 1st two tRNAs have bound…
ribosome
UAC
Met
... UGA
Thr
...
Locates the 1st AUG, sets the reading frame for codon-anticodon base-pairing
16
5’ 3’…AUAUGACUUCAGUAACCAUCUAACA…
UAC
Met
the ribosome breaks the Met-tRNA bond; Met is instead joined to the second amino acid
ribosome
UGA
Thr
...
17
P-site A-site
5’ 3’…AUAUGACUUCAGUAACCAUCUAACA…
UAC
Met
the ribosome breaks the Met-tRNA bond; Met is instead joined to the second amino acid …and the Met-tRNA is released
ribosome
UGA
Thr
...
…then ribosome moves over by 1 codon in the 3’direction
18
5’ 3’…AUAUGACUUCAGUAACCAUCUAACA…UAG...
MetThr Ser Val Thr Phe
STOP
When the ribosome reaches the Stop codon… termination
20
5’ 3’…AUAUGACUUCAGUAACCAUCUAACA…
MetThr Ser Val Thr PheNH3
+
COO-
The finished peptide!
21
C-terminus
N-terminus
Practice Question
Which strand on the DNA sequence is the coding (sense) strand? How can you tell?
22
Finding Sense in Nonsense
cbdryloiaucahjdhtheflybitthedogbutnotthecatjhhajctipheq
GGGTATAGAAAATGAATATAAACTCATAGACAAGATCGGTGAGGGAACATTTTCGTCAGTGTATAAAGCCAAAGATATCACTGGGAAAATAACAAAAAAATTTGCATCACATTTTTGGAATTATGGTTCGAACTATGTTGCTTTGAAGAAAATATACGTTACCTCGTCACCGCAAAGAATTTATAATGAGCTCAACCTGCTGTACATAATGACGGGATCTTCGAGAGTAGCCCCTCTATGTGATGCAAAAAGGGTGCGAGATCAAGTCATTGCTGTTTTACCGTACTATCCCCACGAGGAGTTCCGAACTTTCTACAGGGATCTACCAATCAAGGGAATCAAGAAGTACATTTGGGAGCTACTAAGAGCATTGAAGTTTGTTCATTCGAAGGGAATTATTCATAGAGACATCAAACCGACAAATTTTTTATTTAATTTGGAATTGGGGCGTGGAGTGCTTGTTGATTTTGGTCTAGCCGAGGCTCAAATGGATTATAAAAGCATGATATCTAGTCAAAACGATTACGACAATTATGCAAATACAAACCATGATGGTGGATATTCAATGAGGAATCACGAACAATTTTGTCCATGCATTATGCGTAATCAATATTCTCCTAACTCACATAACCAAACACCTCCTATGGTCACCATACAAAATGGCAAGGTCGTCCACTTAAACAATGTAAATGGGGTGGATCTGACAAAGGGTTATCCTAAAAATGAAACGCGTAGAATTAAAAGGGCTAATAGAGCAGGGACTCGTGGATTTCGGGCACCAGAAGTGTTAATGAAGTGTGGGGCTCAAAGCACAAAGATTGATATATGGTCCGTAGGTGTTATTCTTTTAAGTCTTTTGGGCAGAAGATTTCCAATGTTCCAAAGTTTAGATGATGCGGATTCTTTGCTAGAGTTATGTACTATTTTTGGTTGGAAAGAATTAAGAAAATGCGCAGCGTTGCATGGATTGGGTTTCGAAGCTAGTGGGCTCATTTGGGATAAACCAAACGGATATTCTAATGGATTGAAGGAATTTGTTTATGATTTGCTTAATAAAGAATGTACCATAGGTACGTTCCCTGAGTACAGTGTTGCTTTTGAAACATTCGGATTTCTACAACAAGAATTACATGACAGGATGTCCATTGAACCTCAATTACCTGACCCCAAGACAAATATGGATGCTGTTGATGCCTATGAGTTGAAAAAGTATCAAGAAGAAATTTGGTCCGATCATTATTGGTGCTTCCAGGTTTTGGAACAATGCTTCGAAATGGATCCTCAAAAGCGTAGTTCAGCAGAAGATTTACTGAAAACCCCGTTTTTCAATGAATTGAATGAAAACACATATTTACTGGATGGCGAGAGTACTGACGAAGATGACGTTGTCAGCTCAAGCGAGGCAGATTTGCTCGATAAGGATGTTCT
How do you find out if sequence contains a gene? How do you identify the gene?
23
Reading Frame: the ribosome establishes the grouping of nucleotides that correspond to codons by the first AUG encountered.
ORF: open reading frame, from the first AUG to the first in-frame stop. The ORF encodes the information for the protein.
5’ 3’…AUAUGACUUCAGUAACCAUCUAACA…
Starts counting triplets from this base
More generally: a reading frame with a stretch of codons not interrupted by stop – non-coding RNAs!
24
- read the sequence 5’ 3’, looking for stop
- try each reading frame
- since we know the genetic code—can do a virtual translation if necessary
Looking for ORFs
25
How to identify genes experimentally?
• Information Flow in Genomics
• Gene Structure
• Genetic Linkage
• Mutations
• Chromatin Structure
Outline
26
Gene Structure: The Parts List
= CRM (cis-regulatory motif)• Can be upstream or downstream of promoter, proximal or distal
Exon Exon
Promoter – proximal regulatory element
5’ UTR 3’ UTR
Intron Intron
Enhancer – distal regulatory element
Genomic DNA for a protein-coding eukaryotic gene is comprised of regulatory and coding sequences
27
Promoters
•Promoters are specific sites on DNA that RNA polymerase first binds to initiate the transcription of a gene
• Composed of a variety of different cis-sequence elements which recruit trans-acting factors through DNA-protein interactions
28
Core Promoter Elements
Exon Exon
Promoter
5’ UTR 3’ UTR
Intron IntronEnhancer
TATA inr
TATATA A
TA
~-30
PyPyANTAPyPy
+1
GC A
GGC
CGCC
BRE
- not all elements required
- many promoters lack a TATA box, using instead the
functionally analogous initiator (inr) element
~-50
29
Combinatorial Gene Regulation
• Most eukaryotic genes have multiple cis regulatory motifs
located outside of the core promoter region
• Can be located in promoter proximal regions, 3’downstream regions, and many kb away from target gene
• Allows for combinatorial control of gene expression
30
Distal regulatory elements: Enhancers
Enhancer :
http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=mcb.figgrp.2601
“Enhancesome”
- Can function in either orientation
- Can occur far (>50 kb) from the gene
- Can be up or downstream
- Range in size between ~50-200 bp
- Contain multiple TF binding sites
31
Exon Exon5’ UTR 3’ UTR
• Most eukaryotic mRNAs contain untranslated regions in their 5’and 3’ ends
• The 5’ UTR is the region between the start of transcription and the start of translation
• The 3’ UTR is the region between the stop codon and poly-A tail
• Both the 5’ and 3’ UTRs can contain cis regulatory sequences that bind TFs, influence transport to the cytoplasm, mediate transcript stability, and translational control
Untranslated Regions (UTRs)
32
Alternative Splicing
• mRNA from some genes can be spliced into two or more distinct transcripts
• Creates protein diversity (isoforms)
5’ splice site 3’ splice site
33
Let’s Play “Gene” or “No Gene”
A gene is often a segment of DNA that encodes a protein.
a micro RNA that binds to an mRNA to inhibit translation?
How about DNA that encodes:
an RNA spliced out of an intron and used for another function?
an antisense transcript?
a long non-coding RNA of unknown function?
a pseudogene?34
• Information Flow in Genomics
• Gene Structure
• Genetic Linkage
• Mutations
• Chromatin Structure
Outline
35
Transmission of Genetic Information
Chromosomes condensed
Chromosomes decondensed
Diploid2N2N
1N
1N
2N
Elements of cell division
Cell growth
Chromosome duplication
Chromosome segregation36
Meiosis
Interphase:Chromosomes replicate
Meiosis I:Reductive division, homologouschromosomes separate
Meiosis II:Sister chromatids separate
37
How Does Distance Between Loci Affect Transmission?
Independent Assortment: loci are unlinked or far enough apart that they are transmitted independently from one another
Genetic linkage: loci are close enough together on a chromosome to be transmitted together
39
Genetic Mapping
The frequency of recombination between loci is based on the distance between them
40
Recombination Is A Measure of Distance
• Recombination fraction, = the probability that a recombinant gamete is transmitted
• If two loci are on different chromosomes, they will segregate independently
=> recombination fraction = 0.5
• If two loci are right next to each other, they will segregate together during meiosis
=> recombination fraction = 0
• Jargon:
< 0.5 the loci are close (they are linked)
= 0.5 the loci are far apart (they are not linked) 41
Recombination Is A Measure of Distance
Map Distance = Number Recombinant Gametes
Total Number of Gametesx 100
Centimorgan (cM): a unit of chromosome length, equals the length of chromosome over which crossing-over occurs with 1% frequency
42
Practice Question
• In maize, consider three recessive phenotypes: lazy growth (ll), glossy leaves (gg), and sugary endosperm (ss).
• The following cross was made: Ll Gg Ss x ll gg ss and the observed progeny distribution was (neither gene nor linkage phase is known)
Phenotype Number
wildtype 286
lazy 33
glossy 59
sugary 4
lazy, glossy 2
lazy, sugary 44
glossy, sugary 40
lazy, glossy, sugary 272
Total 740
• Determine order and distances among the three genes
43
Where to begin?
Parental types will constitute ≥ 50% of all progeny, so…
L G S / l g s x l g s / l g s
Recomb.Wild-type for all lazy, gloss, sugary
Rule 1: Two most-frequent gametes types are the parental types
Progeny Phenotype
Progeny Genotypes
Number
wildtype L G S // l g s 286
lazy l G S // l g s 33
glossy L g S // l g s 59
sugary L G s // l g s 4
lazy,glossy l g S // l g s 2
lazy,sugary l G s // l g s 44
glossy,sugary L g s // l g s 40
lazy,glossy,sugary l g s // l g s 272
Total 740
L G S // l g s x l g s // l g s
Linkage phase in heterozygous parent?
L G S
l g s
L g S
l G s
L g s
l G S
L G s
l g S
…which variants of L, G, and S are physical linked and in which order?
Rule 2• The double-recombinant gametes will be the two
least frequent types. A B C
a b c
Progeny Phenotype Progeny Genotypes
Number
wildtype L G S / l g s 286 lazy l G S / l g s 33 glossy L g S / l g s 59 sugary L G s / l g s 4 lazy,glossy l g S / l g s 2 lazy,sugary l G s / l g s 44 glossy,sugary L g s / l g s 40 lazy,glossy,sugary l g s / l g s 272
Total 740
Rule 3
• Effect of double crossovers is to interchange the members of the middle pair of allelesbetween the chromosomes
A B C
a b c
A b C
a B c
Double-crossover types:
• L G s and l g S
Which gene is in the middle?
L s G
l S g
Parental types:
L G S and l g s
L S G
l s g
Now you know linkage phase of heterozygous parent
and gene order…how far apart are these genes?
Count the cross-overs between adjacent genes
• In parents, L allele on same homolog as S and l on same homolog as s. So if these get broken up ---> cross-over between L and S loci
• In parents, S on same homolog as G and s on same homolog as g. If these get broken up --> recombination between S and G loci
L S G
l s g
Rule 4: Reciprocal
products expected to occur in approximately equal numbers
• LGS ≈ lgs (286 ≈ 272)
• LSg ≈ lsG (59 ≈ 44)
• Lsg ≈ lSG (40 ≈ 33)
• LsG ≈ lSg (4 ≈ 2)
Progeny Phenotype
Progeny Genotype #
wildtype L G S / l g s 286 lazy l G S / l g s 33 glossy L g S / l g s 59 sugary L G s / l g s 4 lazy,glossy l g S / l g s 2 lazy,sugary l G s / l g s 44 glossy,sugary L g s / l g s 40 lazy,glossy,sugary l g s / l g s 272
Total 740
• l G S 33• L g s 40• L G s 4• l g S 2
79
Rec Freq L-S Rec Freq S-G
L g S 59l G s 44L G s 4l g S 2
109
Progeny Phenotype
Progeny Genotype #
Crossover or Non-Crossover?
wildtype L G S / l g s 286 Parental (NCO) lazy l G S / l g s 33 single CO between L and S glossy L g S / l g s 59 single CO between S and G sugary L G s / l g s 4 double CO lazy,glossy l g S / l g s 2 double CO lazy,sugary l G s / l g s 44 single CO between S and G glossy,sugary L g s / l g s 40 single CO between L and S lazy,glossy,sugary l g s / l g s 272 Parental (NCO)
Total 740
79/740 or 10.7% of gametes recombinant between L & S.
distance between L & S = 10.7 map units
109/740 or 14.8 % of gametes recombinant between S & G.
distance between S & G=14.8 map units
l G S 33L g s 40L G s 4l g S 2
79
Rec Freq L-S
Rec Freq S-G
L g S 59l G s 44L G s 4l g S 2
109
10.7 mu 14.8 mu
_____________________________
L S G
• Information Flow in Genomics
• Gene Structure
• Genetic Linkage
• Mutations
• Chromatin Structure
Outline
54
Causes and types of mutations
55
• Spontaneous mutations – DNA decay (deamination, change
in hydrogen bond, etc.)
• Replication errors – failure to repair DNA damage in
template strand
• DNA repair errors – double strand break repairs are
error-prone (NHEJ repair)
Small-scale mutations
• Induced mutations – oxidative damage, mutagens (e.g. EMS)
-> substitutions, small insertions and deletions
Most commonly considered human genetics!
Causes and types of mutations
56
• Loss of heterozygosity – common in cancers
• Large duplications and deletions – errors in recombination
through micro-homologies or repetitive (transposable) elements
• Chromosomal rearrangements – inversions or translocations
of chromosomal segments
Large-scale mutations
• Induced mutations – ionizing radiation causes dsDNA breaks
-> mutations affect often many genes, generate gene fusions,
place genes in altered regulatory context, alter gene dosage
Commonly ignored in human genetics!
• Aneuploidy – whole chromosome loss or duplication
Mutation types and consequences
57
Recessive mutations – phenotypic consequences are
buffered by wild-type allele,
dosage from one allele is sufficient
Dominant mutations – phenotypic consequences arise from
one mutated allele
Gain of function – prevalent in cancer, gene fusions
Haploinsufficiency – one wild-type allele not enough for function
copy number variant-associated disorders,
autism, William’s syndrome, polydactyly,
Marfan’s syndrome
Dominant negative – poisons the function of the wild-type
protein, p53, Marfan’s
• Information Flow in Genomics
• Gene Structure
• Genetic Linkage
• Mutations
• Chromatin Structure
Outline
58
Chromosome Structure: Coils of Coils of Coils…
nucleosome
Local unpacking of chromatin allows gene expression and replication
at mitosis
59
Histone Modification and Chromatin Activity
61
• modifications change interaction with DNA and trans-factors
• can activate or repress transcription
• reinforce regulatory patterns set up by TFs
What Do These Modifications Do? A Histone Code?
Carey et al. Cell (2007) 128:707
“Distinct histone modifications, on one or more tails, act sequentially
or in combination that is read by other proteins to bring about distinct
downstream events” (Strahl and Allis, 2000, Nature, 403:41)
62
DNA modification also contributes to chromatin state
DNA methylation can change the activity of a DNA segment
without changing the sequence. In gene promoters, DNA
methylation typically acts to repress gene transcription.
63
methylated Adenine
DNA methylation is essential for normal development
and is associated with a number of key processes including
genomic imprinting, X-chromosome inactivation, repression
of transposable elements, aging, and carcinogenesis.
DNA methylation patterns differ among organisms
• no DNA methylation in common model organisms such as
C. elegans and D. melanogaster
• in plants and other organisms, DNA methylation occurs as
CpG, CHG or CHH (where H correspond to A, T or C)
• In mammals, almost exclusively as CpG, with exception of
embryonic stem cells and developing neuronal cells that show
CHH
• CpGs are depleted in mammalian genomes with exception of CpG
islands in gene promoters (in ~70% of genes, un-methylated)
64
Deamination of 5-methyl cytosine is mutagenic
65
• spontaneous deamination -> C/G into T/A
• most common base substitution in human genome
• explains CpG depletion in the genome
• unmethylated CpGs in promoters protected by DNA repair
Cytosine Uracil
Genome sequencing
ENCODE, modENCODE,
“plantENCODE”
Christine Queitsch
Department of Genome Sciences
66