75
Molecular Biology of the Genome Christine Queitsch Department of Genome Sciences [email protected] 1

Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

Molecular Biology of the Genome

Christine Queitsch

Department of Genome Sciences

[email protected]

1

Page 2: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

• Information Flow in Genomics

• Gene Structure

• Genetic Linkage

• Chromatin Structure

• Genome Sequencing

Outline

2

Page 3: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

DNA and the Flow of Information

The genetic material: DNA - Four kinds of subunits (bases A, C, G, T)

Ile

Gly

Ala

Arg

Lys

Val

Leu

Ile

ProSer

Thr

Cys

Tyr

Asn

Glu

Gln

ArgPhe

Val

Asn

Gln

His

Leu

Cys

Gly

Ser

HisLeu Val

Glu

Ala

Leu

Leu

Tyr

Val

Cys

GlyPhe

Phe

Tyr

Arg

Arg

Ala

Pro

Gln

Glu

Ala

Ala

Gly

Glu

Gly

Gly

Gly

Gly

Gly

Leu

Leu

Gln

Ala

LeuAla

Leu

Pro

Gly

Glu

Pro

Gln

Lys

Val

Gly

Cys

Gln

Glu

Thr

Cys

Ser

LeuGln

Leu

Glu

Asn

Asn

Tyr

Cys

H3N+

COO-

Activities within the cell performed by proteins - Twenty kinds of subunits (amino acids)

A coding problem

A C G

T

3

Page 4: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

The “Central Dogma” of Molecular Biology

Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

DNA RNA Protein

phenotype

transcription translation replication

heredity

4

Page 5: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

DNA Structure

• Information content is in the sequence of bases along a DNA molecule

rules of base pairing each strand of the double helix has all the info needed to recreate the other strand

• Genetic variation — differences in the base sequence between different individuals

• Redundancy in the code

multiple ways that DNA can specify a single amino acid

why individuals vary in their phenotypes

5

Page 6: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

Central Dogma: DNA Replication

DNA structure: polarity and base pairing

5’ 3’ 3’ 5’

Watson

Crick A pairs with T G pairs with C

DNA replication: what’s the point?

duplicate the entire genome prior to cell division

new subunits can only be added to the 3’OH of the growing chain

6 3’

3’

5’

5’ 5’

3’

leading strand

lagging strand

3’

Page 7: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

Central Dogma: Transcription

Genes — specific segments along the chromosomal DNA that code for some function

promoter

mRNA

mRNA

promoter

terminator

Transcription: “copy” gene into RNA (to make a specific protein)

gene gene

gene

terminator

7

Page 8: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

Transcription

Transcription: “copy” gene into RNA to make a specific protein

5’ 3’ 3’ 5’

w

c

gene coding or sense strand

template strand

Where’s the 5’ end of the gene? of the mRNA?

Which way is RNA polymerase moving?

mRNA RNA polymerase

ribonucleic acid… uses uracil (U) in place of thymine (T)

8

Page 9: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

Transcription in vivo

gene

nascent RNA transcripts DNA

RNA polymerases 9

Page 10: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

Practice Question

1. Which way (to the right or left) are RNA polymerases moving?

2. Which strand (W or C) is the template strand?

5’ 3’ 3’ 5’

w

c

gene

10

Page 11: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

Processing of pre-mRNA

Eukaryotic genes are interrupted by introns (non-coding information). They must be removed from the RNA before translation in a process called “splicing.”

mature mRNA introns discarded exons spliced together

exons introns

ORF

gene

UTR’s (untranslated regions)

pre-mRNA

11

Page 12: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

Review of the Central Dogma: Translation

Translating the nucleic acid code to a peptide code…

Possible coding systems:

1 base per amino acid

Could only code for 4 amino acids!

2 bases per amino acid

Could only code for 16 amino acids

3 bases per amino acid

64 possible combinations… that’s plenty!

12

Page 13: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

M e t P h e T h r V a l S e r T h r

A U G A C U U U U U A A A A

A A C C C C G

NH3+ COO-

5’ 3’ mRNA

protein

The triplet code

3 bases = 1 amino acid More than 1 triplet can code for the same amino acid

Translation: reads the information in RNA to order the amino acids in a protein

codon

13

Page 14: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

Punctuation:

M e t P h e T h r V a l S e r T h r

A U G A C U U U U U A A A A

A A C C C C G

NH3+ COO-

5’ 3’ mRNA

protein STOP

Start: AUG = methionine, the first amino acid in (almost) all proteins

Stop: UAA, UAG, and UGA.

NOT an amino acid! 14

Page 15: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

The Genetic Code: Who is the interpreter? Where’s the dictionary? What are the rules of grammar?

aminoacyl tRNA synthetase

amino acid

tRNA

charged tRNA

UAC UAC

Met Met

tRNA = transfer RNA

3’

anticodon

| | | AUG 3’ 5’

recognizes codon in mRNA

5’ 3’

15

Page 16: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

5’ 3’

The ribosome: mediates translation

…AUAUGACUUCAGUAACCAUCUAACA…

After the 1st two tRNAs have bound…

ribosome

UAC

Met

... UGA

Thr

...

Locates the 1st AUG, sets the reading frame for codon-anticodon base-pairing

16

Page 17: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

5’ 3’ …AUAUGACUUCAGUAACCAUCUAACA…

UAC

Met

the ribosome breaks the Met-tRNA bond; Met is instead joined to the second amino acid

ribosome

UGA

Thr

...

17

P-site A-site

Page 18: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

5’ 3’ …AUAUGACUUCAGUAACCAUCUAACA…

UAC

Met

the ribosome breaks the Met-tRNA bond; Met is instead joined to the second amino acid …and the Met-tRNA is released

ribosome

UGA

Thr

...

…then ribosome moves over by 1 codon in the 3’ direction

18

Page 19: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

5’ 3’ …AUAUGACUUCAGUAACCAUCUAACA…

Met

UGA

Thr

AGU ...

Ser

19

Page 20: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

5’ 3’ …AUAUGACUUCAGUAACCAUCUAACA… UAG ...

Met Thr Ser Val Thr Phe

STOP

When the ribosome reaches the Stop codon… termination

20

Page 21: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

5’ 3’ …AUAUGACUUCAGUAACCAUCUAACA…

Met Thr Ser Val Thr Phe NH3

+ COO-

The finished peptide!

21

C-terminus

N-terminus

Page 22: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

Practice Question

Which strand on the DNA sequence is the coding (sense) strand? How can you tell?

22

Page 23: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

Finding Sense in Nonsense

cbdryloiaucahjdhtheflybitthedogbutnotthecatjhhajctipheq

GGGTATAGAAAATGAATATAAACTCATAGACAAGATCGGTGAGGGAACATTTTCGTCAGTGTATAAAGCCAAAGATATCACTGGGAAAATAACAAAAAAATTTGCATCACATTTTTGGAATTATGGTTCGAACTATGTTGCTTTGAAGAAAATATACGTTACCTCGTCACCGCAAAGAATTTATAATGAGCTCAACCTGCTGTACATAATGACGGGATCTTCGAGAGTAGCCCCTCTATGTGATGCAAAAAGGGTGCGAGATCAAGTCATTGCTGTTTTACCGTACTATCCCCACGAGGAGTTCCGAACTTTCTACAGGGATCTACCAATCAAGGGAATCAAGAAGTACATTTGGGAGCTACTAAGAGCATTGAAGTTTGTTCATTCGAAGGGAATTATTCATAGAGACATCAAACCGACAAATTTTTTATTTAATTTGGAATTGGGGCGTGGAGTGCTTGTTGATTTTGGTCTAGCCGAGGCTCAAATGGATTATAAAAGCATGATATCTAGTCAAAACGATTACGACAATTATGCAAATACAAACCATGATGGTGGATATTCAATGAGGAATCACGAACAATTTTGTCCATGCATTATGCGTAATCAATATTCTCCTAACTCACATAACCAAACACCTCCTATGGTCACCATACAAAATGGCAAGGTCGTCCACTTAAACAATGTAAATGGGGTGGATCTGACAAAGGGTTATCCTAAAAATGAAACGCGTAGAATTAAAAGGGCTAATAGAGCAGGGACTCGTGGATTTCGGGCACCAGAAGTGTTAATGAAGTGTGGGGCTCAAAGCACAAAGATTGATATATGGTCCGTAGGTGTTATTCTTTTAAGTCTTTTGGGCAGAAGATTTCCAATGTTCCAAAGTTTAGATGATGCGGATTCTTTGCTAGAGTTATGTACTATTTTTGGTTGGAAAGAATTAAGAAAATGCGCAGCGTTGCATGGATTGGGTTTCGAAGCTAGTGGGCTCATTTGGGATAAACCAAACGGATATTCTAATGGATTGAAGGAATTTGTTTATGATTTGCTTAATAAAGAATGTACCATAGGTACGTTCCCTGAGTACAGTGTTGCTTTTGAAACATTCGGATTTCTACAACAAGAATTACATGACAGGATGTCCATTGAACCTCAATTACCTGACCCCAAGACAAATATGGATGCTGTTGATGCCTATGAGTTGAAAAAGTATCAAGAAGAAATTTGGTCCGATCATTATTGGTGCTTCCAGGTTTTGGAACAATGCTTCGAAATGGATCCTCAAAAGCGTAGTTCAGCAGAAGATTTACTGAAAACCCCGTTTTTCAATGAATTGAATGAAAACACATATTTACTGGATGGCGAGAGTACTGACGAAGATGACGTTGTCAGCTCAAGCGAGGCAGATTTGCTCGATAAGGATGTTCT

How do you find out if sequence contains a gene? How do you identify the gene?

23

Page 24: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

Reading Frame: the ribosome establishes the grouping of nucleotides that correspond to codons by the first AUG encountered.

ORF: open reading frame, from the first AUG to the first in-frame stop. The ORF encodes the information for the protein.

5’ 3’ …AUAUGACUUCAGUAACCAUCUAACA…

Starts counting triplets from this base

More generally: a reading frame with a stretch of codons not interrupted by stop – non-coding RNAs!

24

Page 25: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

- read the sequence 5’ 3’, looking for stop

- try each reading frame

- since we know the genetic code—can do a virtual translation if necessary

Looking for ORFs

25

How to identify genes experimentally?

Page 26: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

• Information Flow in Genomics

• Gene Structure

• Genetic Linkage

• Chromatin Structure

• Genome Sequencing

Outline

26

Page 27: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

Gene Structure: The Parts List

= CRM (cis-regulatory motif) • Can be upstream or downstream of promoter, proximal or distal

Exon Exon

Promoter – proximal regulatory element

5’ UTR 3’ UTR

Intron Intron

Enhancer – distal regulatory element

Genomic DNA for a protein-coding eukaryotic gene is comprised of regulatory and coding sequences

27

Page 28: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

Promoters

•Promoters are specific sites on DNA that RNA polymerase first binds to initiate the transcription of a gene

• Composed of a variety of different cis-sequence elements which recruit trans-acting factors through DNA-protein interactions

28

Page 29: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

Core Promoter Elements

Exon Exon

Promoter

5’ UTR 3’ UTR

Intron Intron Enhancer

TATA inr

T A TATA A

T A

~-30

PyPyAN T A PyPy

+1

G C A

G G C

CGCC

BRE

- not all elements required

- many promoters lack a TATA box, using instead the

functionally analogous initiator (inr) element

~-50

29

Page 30: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

Combinatorial Gene Regulation

• Most eukaryotic genes have multiple cis regulatory motifs

located outside of the core promoter region

• Can be located in promoter proximal regions, 3’ downstream regions, and many kb away from target gene

• Allows for combinatorial control of gene expression

30

Page 31: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

Distal regulatory elements: Enhancers

Enhancer :

http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=mcb.figgrp.2601

“Enhancesome”

- Can function in either orientation

- Can occur far (>50 kb) from the gene

- Can be up or downstream

- Range in size between ~50-200 bp

- Contain multiple TF binding sites

31

Page 32: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

Exon Exon 5’ UTR 3’ UTR

• Most eukaryotic mRNAs contain untranslated regions in their 5’ and 3’ ends

• The 5’ UTR is the region between the start of transcription and the start of translation

• The 3’ UTR is the region between the stop codon and poly-A tail

• Both the 5’ and 3’ UTRs can contain cis regulatory sequences that bind TFs, influence transport to the cytoplasm, mediate transcript stability, and translational control

Untranslated Regions (UTRs)

32

Page 33: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

Alternative Splicing

• mRNA from some genes can be spliced into two or more distinct transcripts

• Creates protein diversity (isoforms)

5’ splice site 3’ splice site

33

Page 34: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

• Information Flow in Genomics

• Gene Structure

• Genetic Linkage

• Chromatin Structure

• Genome Sequencing

Outline

34

Page 35: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

Transmission of Genetic Information

Chromosomes condensed

Chromosomes decondensed

Diploid 2N 2N

1N

1N

2N

Elements of cell division

Cell growth

Chromosome duplication

Chromosome segregation 35

Page 36: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

Meiosis

Interphase: Chromosomes replicate

Meiosis I: Reductive division, homologous chromosomes separate

Meiosis II: Sister chromatids separate

36

Page 37: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

Recombination

37

Page 38: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

How Does Distance Between Loci Affect Transmission?

Independent Assortment: loci are unlinked or far enough apart that they are transmitted independently from one another

Genetic linkage: loci are close enough together on a chromosome to be transmitted together

38

Page 39: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

Genetic Mapping

The frequency of recombination between loci is based on the distance between them

39

Page 40: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

Recombination Is A Measure of Distance

• Recombination fraction, = the probability that a recombinant gamete is transmitted

• If two loci are on different chromosomes, they will segregate independently

=> recombination fraction = 0.5

• If two loci are right next to each other, they will segregate together during meiosis

=> recombination fraction = 0

• Jargon:

< 0.5 the loci are close (they are linked)

= 0.5 the loci are far apart (they are not linked) 40

Page 41: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

Recombination Is A Measure of Distance

 

Map Distance = Number Recombinant Gametes

Total Number of Gametesx 100

Centimorgan (cM): a unit of chromosome length, equals the length of chromosome over which crossing-over occurs with 1% frequency

41

Page 42: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

Practice Question

• In maize, consider three recessive phenotypes: lazy growth (ll), glossy leaves (gg), and sugary endosperm (ss).

• The following cross was made: Ll Gg Ss x ll gg ss and the observed progeny distribution was (neither gene nor linkage phase is known)

Phenotype Number

wildtype 286

lazy 33

glossy 59

sugary 4

lazy, glossy 2

lazy, sugary 44

glossy, sugary 40

lazy, glossy, sugary 272

Total 740

• Determine order and distances among the three genes

42

Page 43: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

Where to begin?

Parental types will constitute ≥ 50% of all progeny, so…

L G S / l g s x l g s / l g s

Recomb. Wild-type for all lazy, gloss, sugary

Rule 1: Two most-frequent gametes types are the parental types

Page 44: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

Progeny Phenotype

Progeny Genotypes

Number

wildtype L G S // l g s 286

lazy l G S // l g s 33

glossy L g S // l g s 59

sugary L G s // l g s 4

lazy,glossy l g S // l g s 2

lazy,sugary l G s // l g s 44

glossy,sugary L g s // l g s 40

lazy,glossy,sugary l g s // l g s 272

Total 740

L G S // l g s x l g s // l g s

Page 45: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

Linkage phase in heterozygous parent?

• L G S or L g S or l g S or L g s

• l g s l G s L G s

l G S

Page 46: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

Rule 2 • The double-recombinant gametes will be the two

least frequent types.

A B C

a b c

Progeny Phenotype Progeny Genotypes

Number

wildtype L G S / l g s 286 lazy l G S / l g s 33 glossy L g S / l g s 59 sugary L G s / l g s 4 lazy,glossy l g S / l g s 2 lazy,sugary l G s / l g s 44 glossy,sugary L g s / l g s 40 lazy,glossy,sugary l g s / l g s 272

Total 740

Page 47: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

Rule 3

• Effect of double crossovers is to interchange the members of the middle pair of alleles between the chromosomes

A B C

a b c

A b C

a B c

Page 48: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

Double-crossover types:

• L G s and l g S

Which gene is in the middle?

L s G

l S g

Parental types:

L G S and l g s

L S G

l s g

Now you know linkage phase of heterozygous parent

and gene order…how far apart are these genes?

Page 49: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

Count the cross-overs between adjacent genes

• In parents, L allele on same homolog as S and l on same homolog as s. So if these get broken up ---> cross-over between L and S loci

• In parents, S on same homolog as G and s on same homolog as g. If these get broken up --> recombination between S and G loci

L S G

l s g

Page 50: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

Rule 4: Reciprocal

products expected to occur in approximately equal numbers

• LGS ≈ lgs (286 ≈ 272)

• LgS ≈ lGs (59 ≈ 44)

• Lgs ≈ lGS (40 ≈ 33)

• LGs ≈ lgS (4 ≈ 2)

Progeny Phenotype

Progeny Genotype #

wildtype L G S / l g s 286 lazy l G S / l g s 33 glossy L g S / l g s 59 sugary L G s / l g s 4 lazy,glossy l g S / l g s 2 lazy,sugary l G s / l g s 44 glossy,sugary L g s / l g s 40 lazy,glossy,sugary l g s / l g s 272

Total 740

Page 51: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

• l G S 33 • L g s 40 • L G s 4 • l g S 2 79

Rec Freq L-S Rec Freq S-G

L g S 59 l G s 44 L G s 4 l g S 2 109

Progeny Phenotype

Progeny Genotype #

Crossover or Non-Crossover?

wildtype L G S / l g s 286 Parental (NCO) lazy l G S / l g s 33 single CO between L and S glossy L g S / l g s 59 single CO between S and G sugary L G s / l g s 4 double CO lazy,glossy l g S / l g s 2 double CO lazy,sugary l G s / l g s 44 single CO between S and G glossy,sugary L g s / l g s 40 single CO between L and S lazy,glossy,sugary l g s / l g s 272 Parental (NCO)

Total 740

Page 52: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

79/740 or 10.7% of gametes recombinant between L & S. distance between L & S = 10.7 map units 109/740 or 14.8 % of gametes recombinant between S & G. distance between S & G= 14.8 map units

l G S 33 L g s 40 L G s 4 l g S 2 79

Rec Freq L-S

Rec Freq S-G

L g S 59 l G s 44 L G s 4 l g S 2 109

10.7 mu 14.8 mu

_____________________________

L S G

Page 53: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

• Information Flow in Genomics

• Gene Structure

• Genetic Linkage

• Chromatin Structure

• Genome Sequencing

Outline

53

Page 54: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

Chromosome Structure: Coils of Coils of Coils…

nucleosome

Local unpacking of chromatin allows gene expression and replication

at mitosis

54

Page 55: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

Nucleosomes

• ~146 bp of DNA wrapped around nucleosome • ~ 80 bp linker • histone octamer

55

Page 56: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

Histone Modification and Chromatin Activity

56

• modifications change interaction with DNA and trans-factors

• can activate or repress transcription

• reinforce regulatory patterns set up by TFs

Page 57: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

What Do These Modifications Do? A Histone Code?

Carey et al. Cell (2007) 128:707

“Distinct histone modifications, on one or more tails, act sequentially

or in combination that is read by other proteins to bring about distinct

downstream events” (Strahl and Allis, 2000, Nature, 403:41)

57

Page 58: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

• Information Flow in Genomics

• Gene Structure

• Genetic Linkage

• Chromatin Structure

• Genome Sequencing

Outline

58

Page 59: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

• Next-Generation

• Sanger sequencing

DNA Sequencing Technology

• 3rd and 4th Generation

59

Page 60: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

Genome Sequencing: Hierarchical Shotgun Sequencing

• Shear genomic DNA into smaller pieces and subclone into library (such as BACs, Cosmids, etc.)

• Create physical map

• Shotgun sequence each BAC from minimal tiling path (shearing of ~150kb BAC clone into ~ 2kb fragments)

• Data from linkage and physical maps used to assemble sequence maps of chromosomes

60

Page 61: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

• Whole genome randomly sheared three times – Plasmid library constructed

with ~ 2kb inserts – Plasmid library with ~10 kb

inserts – BAC library with ~200 kb

inserts

• Computer program assembles sequences into chromosomes

• No physical map construction

• Only one BAC library

• Overcomes problems of repeat sequences…only not really

Genome Sequencing: Whole Genome Shotgun Sequencing

61

Page 62: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

62

Page 63: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

Next-Generation Sequencing Technology

• Illumina HiSeq:

– 4 billion reads per flow cell X 100 bases, paired = 400 Gbp

– 8 samples per flow cell = 50 Gbp each (one human genome = 3 Gbp)

– Reagent cost ~$8K per run

Updated: HiSeq 3000/4000 SBS Kits enable up to 1500 Gb (1.5 Tb) of output per dual flow cell run

• ABI SOLID: similar yield

• Roche 454: 1 million reads X 500 bases = 0.5 Gbp

63

Page 64: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

Illumina sequencing

64

Mardis, ER, 2008, ARGHG

1. 2.

3. 4.

Page 65: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

Illumina sequencing: clusters

65

Page 66: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

Illumina sequencing: sequence reaction

66

Page 67: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

Illumina sequencing: sequence reaction

Sequence clusters are imaged after each cycle of

synthesis

67

Page 68: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

What is missed?

68

Plenty: repetitive DNA and structural variation

C

C

C

C

C C A

A

A

A A

A G G

G

G G G

Example: short tandem repeats

Page 69: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

3rd Generation Sequencing Technology

• Single Molecule Real Time (SMRT) sequencing technology (PacBio RS)

• based on ‘circular’ DNA molecules read by polymerase

• and long reads - up to 10kb

• error-prone

69

Page 70: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

4th Generation Sequencing Technology

• Protein nanopore sequencing

(Oxford Nanopore)

• ultra-long reads - up to 1MB, limited by integrity of the DNA

• high error rate, low throughput

70

Page 71: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

Next-Gen Sequencing - What’s All the Fuss About?

71

Page 72: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

The Era of Personal Genomics?

James D. Watson

(5/31/2007)

J. Craig Venter (8/4/2007)

http://www.ffrf.org/day/img/0406_watson.gif, http://www-news.uchicago.edu/releases/07/images/070601.watson.jpg

http://www.usnews.com/usnews/images/news/photos/venter051022.jpg

It is here. The challenge is interpretation.

Page 73: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

“Censoring” of Watson’s ApoE gene

3.6 kb

Important ethical issues confront personal

genomics.

73

Page 74: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

Interpreting Genome Sequences

• Pilot Project Description – ENCODE Project Consortium et al. The

ENCODE (ENCyclopedia Of DNA Elements) Project. Science (2004) vol. 306 (5696)

• Pilot Project Results – ENCODE Project Consortium et al.

Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature (2007) vol. 447 (7146)

The ENCODE Project: comprehensive parts list of the functional elements in the human genome

74

Page 75: Molecular Biology of the Genome - Biostatistics...The “Central Dogma” of Molecular Biology Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid

Let’s Play “Gene” or “No Gene”

A gene is often a segment of DNA that encodes a protein.

a micro RNA that binds to an mRNA to inhibit translation?

How about DNA that encodes:

an RNA spliced out of an intron and used for another function?

an antisense transcript?

a long non-coding RNA of unknown function?

a pseudogene? 75