27
LECTURE 2. DNA Sequencing and Structural Genomics

LECTURE 2. DNA Sequencing and Structural Genomics

Embed Size (px)

DESCRIPTION

LECTURE 2. DNA Sequencing and Structural Genomics. Sequencing with DNA Polymerases and Chain Terminators (Sanger sequencing). Synthesize new DNA using cloned DNA as template. Depends on hybridization of a primer to the DNA template. 1980 Nobel Prize. Fred Sanger. - PowerPoint PPT Presentation

Citation preview

Page 1: LECTURE 2.   DNA Sequencing and Structural Genomics

LECTURE 2. DNA Sequencing andStructural Genomics

Page 2: LECTURE 2.   DNA Sequencing and Structural Genomics

Sequencing with DNA Polymerases and Chain Terminators (Sanger sequencing)

1980 Nobel Prize

Synthesize new DNA using cloned DNA as template. Depends on hybridization of a primer to the DNA template.

                               

Fred Sanger

Page 3: LECTURE 2.   DNA Sequencing and Structural Genomics

Manual Sanger Sequencing

Page 4: LECTURE 2.   DNA Sequencing and Structural Genomics

Enzyme 3' exoProcessivity

*rate of

polymerase#

Klenow (+) 10-50 45

Reverse Transcriptase

(-) 10 5

T7 sequenase**

(-) 2000-3000 300

Taq (-) 7500 35-100

Properties of DNA Pols used for Sequencing

Page 5: LECTURE 2.   DNA Sequencing and Structural Genomics

Major Problem with Sanger sequencing:

DNA secondary structures form with ss DNA. Intramolecular Watson-Crick Base pairs

Causes Stops and Compressions=Gel Artifacts (bases are closer together than normal spacing) This is especially a problem in GC rich regions (which form stable "hairpins").

Page 6: LECTURE 2.   DNA Sequencing and Structural Genomics

STRATEGIES for DNA SEQUENCING

-DIRECTED SEQUENCINGStart at ends of cloned DNA molecule using UNIVERSAL PRIMER SITES present in the vector sequence. Design a

new sequencing primer based on the first round of sequence to continue the job: PRIMER WALKING

USED FOR SMALLER DNAs: cDNAs: <10 KB

-RANDOM SEQUENCINGFragment the cloned DNA randomly and subclone pieces

into vector. Sequence all clones using UNIVERSAL PRIMER. Use a computer to align sequence overlaps and

determine the entire sequence of the starting DNA

USE FOR LONG DNAs: BACS, etc. (GENOMIC)

Page 7: LECTURE 2.   DNA Sequencing and Structural Genomics

PRIMER WALKING

Page 8: LECTURE 2.   DNA Sequencing and Structural Genomics

STRATEGIES for DNA SEQUENCING

-DIRECTED SEQUENCINGStart at ends of cloned DNA molecule using UNIVERSAL PRIMER SITES present in the vector sequence. Design a new sequencing primer based on the first round of sequence to continue the job: PRIMER WALKING

USED FOR SMALLER DNAs: cDNAs: <10 KB

-RANDOM SEQUENCINGFragment the cloned DNA randomly and subclone pieces into vector. Sequence all clones using UNIVERSAL PRIMER. Use a computer to align sequence overlaps and determine the entire sequence of the starting DNA

USE FOR LONG DNAs: BACS, etc. (GENOMIC)

Page 9: LECTURE 2.   DNA Sequencing and Structural Genomics

RANDOM SEQUENCING

BAC clone

Page 10: LECTURE 2.   DNA Sequencing and Structural Genomics

4100 genes 6000 genes 18,000 genes 14,000 genes

35-70,000 genes?

50 genes

Genomes are LARGE and impractical to sequence by manual methods

Page 11: LECTURE 2.   DNA Sequencing and Structural Genomics
Page 12: LECTURE 2.   DNA Sequencing and Structural Genomics

BOTTLENECKS IN LARGE SCALE AUTOMATED SEQUENCING:

-sub-cloning of target DNA into appropriate vectors-preparation of DNA of quality suitable for sequencing-setting up sequencing reactions-pouring and loading sequencing gels-GEL ELECTROPHORESIS ARTIFACTS (due to secondary DNA structures).

ALTERNATIVES to gels for separating sequencing products:

-sequencing by HYBRIDIZATION-Mass Spectrometry Matrix-Assisted Laser Desorption/Ionization Time of Flight Mass Spectrometry (MALDI-TOFMS)-capillary electrophoresis

Page 13: LECTURE 2.   DNA Sequencing and Structural Genomics

50-100 uM

40 cm

1. Ultra-thin, long gels can be run at very high voltages2kV to 10kV: short runs, theoretically good separation2. Samples can be directly loaded from 96-well plate format by electrophoresis: easy to automate3. Use non-polymerized gel media: can be automatically removed and replaced in between runs.don't have to take apart and make sequencing gels4. Capillaries can be clustered: new automated model has 4 X 16 (96) arrays.

+-

Page 14: LECTURE 2.   DNA Sequencing and Structural Genomics

The ABI 3700 Automated Sequencer: Quick, Cheap Genome Sequencing

Emission Spectra of dyes used with the ABI3700

Page 15: LECTURE 2.   DNA Sequencing and Structural Genomics

Front View

Page 16: LECTURE 2.   DNA Sequencing and Structural Genomics

Fully Automated System that Requires 5 min of manpower per run:

Example: Let's say we that the 9 kV run gives us 600 bp reliably for run

4 runs (10 hr day) X 96 X 600= 230,400 bp per day!

Page 17: LECTURE 2.   DNA Sequencing and Structural Genomics

Human Genome Project Goals: Three Orderly Steps to Complete the Genome Sequence1) Complete Genetic MapThe 1999 map is based on 42,000 STSs and ESTs (representing 30,000 genes) and 1102 informative microsattelite markers http://www.ncbi.nlm.nih.gov/genemap/

Currently, ~4.8 million SingleNucleotide Polymorphisms are(SNPs) are mapped.

1 SNP every 1200, on average

~25,000 associated with genes

Page 18: LECTURE 2.   DNA Sequencing and Structural Genomics

2) Physical Map is largely assembled

BAC Contigs for the Human Genome

Page 19: LECTURE 2.   DNA Sequencing and Structural Genomics

3) As of 25 may, 1999 , ~19 % of the genome sequenced (+63% in “draft”) http://www.ncbi.nlm.nih.gov/genome/seq/

Goal: to finish entire sequence by 2003Cost: $3 billion (orginal goal was 2005)

Page 20: LECTURE 2.   DNA Sequencing and Structural Genomics

Shotgun Sequencing the Human Genome:>90% of the genome has been completedsince Spring 2000 by CeleraVenter JC, Adams MD, Sutton GG, Kerlavage AR, Smith HO, Hunkapiller M 1998. Shotgun sequencing of the human genome. Science 1 5:1540-1542.

Human Genome Plan is ordered: genetic map, contig, completely sequence the BACs that make up the contigsShotgun Approach: (already proven successful for many bacterial genomes and in 2000 for drosophila): -just start sequencing random clones without bothering to order them -sequence them only from the ends (not completely)-sequence enough random clones this way and you will cover the entire genome-use sophisticated computer programs to put the genome back together

Page 21: LECTURE 2.   DNA Sequencing and Structural Genomics

Covering the genome. A 100-kbp portion of the genome showing expected clone coverage needed for shotgun sequencing.

Shotgun Approach: Randomly sequence clones from different types of libraries

Page 22: LECTURE 2.   DNA Sequencing and Structural Genomics

35 billion bases to be sequencedTime: less than 1 yearCost: ~$250 million

April 2000: Celera finishes sequencing phase of the project: 11X coverage of the genome of four-five individuals September, 2000: Initial assembly of the human genome completed (using sequences in public databases as well)October 2000: Sequencing phase of mouse genome project completed; ~9 billion base pairs.

Page 23: LECTURE 2.   DNA Sequencing and Structural Genomics

Problems with this approach:

-only 90-95% of genome can be sequenced: many gaps for others to fill-Sequence will not be annotated and may notbe released in a timely fashion: in fact, youneed to subscribe to Celera for this infoCost: $450,000 minimum per University-Are they doing this just to get a jump on patenting genes? Ethical problems??

Who’s DNA was sequenced? Craig Venter (Celera)

Page 24: LECTURE 2.   DNA Sequencing and Structural Genomics

Oct 18, 2001 , ~47 % sequenced (+51% in “draft”)

What about the Genome Consortium?Sept, 2000 , ~24 % sequenced (+66% in “draft”)

May, 1999 , ~19 % sequenced (+63% in “draft”)

Genome Watch

23 Oct 2002

Draft       5.8%

Finished 

  

92.8%

Total  98.6%

Page 25: LECTURE 2.   DNA Sequencing and Structural Genomics

Was Shotgun Sequencing of the Human Genome Successful?

The Celera assembly dependedOn BAC tiles in the public database;gaps in the Celera sequence were filled with sequence obtained from the public database

Waterston RH, Lander ES, Sulston JE. 2002. On the sequencing of the human genome. PNAS USA 99 :3712-371.

NO!

Myers EW, Sutton GG, Smith HO, Adams MD, Venter JC.2002. On the sequencing and assembly of the human genome.Proc Natl Acad Sci U S A.99 :4145-4146

SORELOSERS!

The Truth:Both Approaches are RequiredTo Sequence Large Genomes!

Page 26: LECTURE 2.   DNA Sequencing and Structural Genomics

Where are we now?Estimates Range that 2-20% of the genome still remains to be sequenced

Completion of the genome is likely still 2-5 years awayGaps in BACs to fill; “unclonable” sequences?

For example, still controversy over how many genes encoded inthe human genome 30,000 or 70,000?

Page 27: LECTURE 2.   DNA Sequencing and Structural Genomics

Chr 21 BAC/gene map Chr 15 BAC/gene map

See http://www.ncbi.nih.gov/cgi-bin/Entrez/hum_srch