31
BioSci 203 lecture 22 page 1 © copyright Bruce Blumberg 2001. All rights reserved Bio Sci 203 Lecture 22 - cDNA library screening &sequence characterization Bruce Blumberg ([email protected]) office - 4203 McGaugh Hall 824-8573 lab 5427 (x46873), 5305 (x43116) office hours Wednesday 1-2. Today Characterization of Selected DNA Sequences DNA sequence analysis

BioSci 203 lecture 22 page 1 © copyright Bruce Blumberg 2001. All rights reserved Bio Sci 203 Lecture 22 - cDNA library screening &sequence characterization

Embed Size (px)

Citation preview

BioSci 203 lecture 22 page 1 ©copyright Bruce Blumberg 2001. All rights reserved

Bio Sci 203 Lecture 22 - cDNA library screening &sequence characterization

• Bruce Blumberg ([email protected])

– office - 4203 McGaugh Hall

– 824-8573

– lab 5427 (x46873), 5305 (x43116)

– office hours Wednesday 1-2.

• Today

– Characterization of Selected DNA Sequences

• DNA sequence analysis

BioSci 203 lecture 22 page 2 ©copyright Bruce Blumberg 2001. All rights reserved

Analysis of genes and cDNAs

• Characterization of cloned DNA

– what things do we want to know about a new gene?

• Complete DNA sequence

– cDNA sequence

– genomic sequence?

– Restriction enzyme maps?

• Where are introns and exons?

– Particularly if knockouts are coming

• where is the promoter(s)?

– Alternative promoter use?

– Mapping transcription start(s)

• where and when is mRNA expressed?

– How abundantly is it expressed in each place?

– Is there any association between expression levels and putative function?

• What is the function of this gene?

– Loss-of-function analysis decisive

» knockout

» antisense

» mutant mRNA e.g. dominant negative

– gain of function may be helpful

» transgenic

» mutant mRNA - constitutively active transcription factor

BioSci 203 lecture 22 page 3 ©copyright Bruce Blumberg 2001. All rights reserved

Analysis of genes and cDNAs (contd)

• Landmarks in DNA sequencing

– Sanger, Nicklen and Coulson. Sequencing with chain terminating inhibitors. Proc. Natl. Acad. Sci. 74, 5463-5467 (1977).

– Sanger, F. et al. The nucleotide sequence of bacteriophage ΦX174. J Mol Biol 125, 225-46. (1978).

– Sutcliffe, J. G. Complete nucleotide sequence of the Escherichia coli plasmid pBR322. Cold Spring Harb Symp Quant Biol 43, 77-90. (1979).

– Sanger et al., Nucleotide sequence of bacteriophage lambda DNA. J Mol Biol 162, 729-73. (1982).

– Messing, J., Crea, R. & Seeburg, P. H. A system for shotgun DNA sequencing. Nucl.Acids Res 9, 309-21 (1981).

– Anderson, S. et al. Sequence and organization of the human mitochondrial genome. Nature 290, 457-65 (1981).

– Deininger, P. L. Random subcloning of sonicated DNA: application to shotgun DNA sequence analysis. Anal Biochem 129, 216-23. (1983).

– Baer et al. DNA sequence and expression of the B95-8 Epstein-Barr virus genome. Nature 310, 207-11. (1984). (189 kb)

– Innis et al. DNA sequencing with Taq DNA polymerase and direct sequencing of PCR-amplified DNA Proc. Natl. Acad. Sci. 85, 9436-9440 (1988)

BioSci 203 lecture 22 page 4 ©copyright Bruce Blumberg 2001. All rights reserved

Analysis of genes and cDNAs (contd)

• Landmarks in DNA sequencing (contd).– 1995 - Haemophilus influenzae (1.83 Mb)

– 1995 - Mycoplasma genitalium (0.58 Mb)

– 1996 - Saccharomyces cerevisiae genome (13 Mb)– 1996 - Methanococcus jannaschii (1.66 Mb)

– 1997 - Escherichia coli (4.6 Mb)– 1997 - Bacillus subtilis (4.2 Mb)– 1997 - Borrelia burgdorferi (1.44 Mb)

– 1997 - Archaeoglobus fulgidus (2.18 Mb)

– 1997 - Helicobacter pylori (1.66 Mb)

– 1998 - Treponema pallidum (1.14 Mb)– 1998 - Caenorhabditis elegans genome (97 Mb)– 1999 - Deinococcus radiodurans (3.28 Mb)

– 2000 - Drosophila melanogaster (120 Mb)– 2000 - Arabidopsis thaliana (115 Mb)– 2001 - Escherichia coli O157:H7 (4.1 Mb)– 2001 - Human “genome”– 2002 – mouse genome– 2002 – Ciona intestinalis

• first bacterium sequenced, human pathogen

• smallest free living organism

• first Archaebacteria

• Lyme disease

• first sulfur metabolizing bacterium

• first bacteria to cause cancer

• resistant to radiation, starvation, ox stress

BioSci 203 lecture 22 page 5 ©copyright Bruce Blumberg 2001. All rights reserved

BioSci 203 lecture 22 page 6 ©copyright Bruce Blumberg 2001. All rights reserved

Genome sequencing

• DOE – Joint Genome Institute

– http://www.jgi.doe.gov/

– Numerous advances in sequencing technology

• Increased pass rate from ~70% to > 90%

• Lowered cost nearly 3 fold

BioSci 203 lecture 22 page 7 ©copyright Bruce Blumberg 2001. All rights reserved

The human genome

• In Feb 12 2001, Celera and Human Genome project published “draft” human genome sequencs

– Celera -> 39114

– Ensembl -> 29691

– Consensus from all sources ~30K

• Number of genes

– C. elegans – 19,000

– Arabidopsis 25,000

• Predictions had been from 50-140k human genes

– What’s up with that?

– Are we only slightly more complicated than a weed?

– How can we possibly get a human with less than 2x the number of genes as C. elegans

– Implications?

• UNRAVELING THE DNA MYTH: The spurious foundation of genetic engineering, Barry Commoner, Harpers Magazine Feb, 2002

BioSci 203 lecture 22 page 8 ©copyright Bruce Blumberg 2001. All rights reserved

The human genome

• The answer – Sloppy science

– Gene sets don’t overlap completely

– Floor is 42K

– 128,826 UniGene clusters from ESTs

= 42113

BioSci 203 lecture 22 page 9 ©copyright Bruce Blumberg 2001. All rights reserved

The human genome

• How finished is the human genome sequence?

– Draft sequence to high coverage

– Chromosome by chromosome finishing now

• Chr 22 – 1999

• Chr 21 – 2000

• Chr 20 – 2001

• Chr 15 – 2003

• Knowing what we know now – how to approach a large new genome?

– Xenopus tropicalis (about ½ human)

– BAC end sequencing

– Whole genome shotgun

– Gaps closed with BACS

– 8 x coverage by end of 2004

– Finishing dependent on additional funding

BioSci 203 lecture 22 page 10 ©copyright Bruce Blumberg 2001. All rights reserved

DNA Sequence analysis

• Complete DNA sequence

– complete sequence is desirable but takes time

• how long depends on size and strategy employed

– which strategy to use depends on various factors

• how large is the clone?

– cDNA

– genomic

• How fast is sequence required?

• sequencing strategies

– primer walking

– cloning and sequencing of restriction fragments

– progressive deletions

• bidirectional

• unidirectional

– Shotgun sequencing

• whole genome

• with mapping

– map first (C. elegans)

– map as you go (many)

BioSci 203 lecture 22 page 11 ©copyright Bruce Blumberg 2001. All rights reserved

DNA Sequence analysis (contd)

• Primer walking - walk from the ends with oligonucleotides

– sequence, back up ~50 nt from end, make a primer and continue

• Why back up?

BioSci 203 lecture 22 page 12 ©copyright Bruce Blumberg 2001. All rights reserved

DNA Sequence analysis (contd)

• Primer walking (contd)

– advantages

• very simple

• no possibility to lose bits of DNA

– restriction mapping

– deletion methods

• no restriction map needed

• best choice for short DNA

– disadvantages

• slowest method

– about a week between sequencing runs

• oligos are not free (and not reusable)

• not feasible for large sequences

– applications

• cDNA sequencing when time is not critical

• targeted sequencing

– verification

– closing gaps in sequences

BioSci 203 lecture 22 page 13 ©copyright Bruce Blumberg 2001. All rights reserved

DNA Sequence analysis (contd)

• Cloning and sequencing of restriction fragments

– once the most popular method

• make a restriction map

• subclone fragments

• sequence

– advantages

• straightforward

• directed approach

• can go quickly

• cloned fragments often useful otherwise

– RNase protection

– nuclease mapping

– in situ hybridization

– disadvantages

• possible to lose small fragments

– must run high quality analytical gels

• depends on quality of restriction map

– mistaken mapping -> wrong sequence

• restriction site availability

– applications

• sequencing small cDNAs

• isolating regions to close gaps

BioSci 203 lecture 22 page 14 ©copyright Bruce Blumberg 2001. All rights reserved

DNA Sequence analysis (contd)

• nested deletion strategies - make sequential deletions from one end of the clone– cut, close and sequence

• make restriction map• use enzymes that cut in polylinker and insert• Religate, sequence from end with restriction site• repeat until finished, filling in gaps with oligos

BioSci 203 lecture 22 page 15 ©copyright Bruce Blumberg 2001. All rights reserved

DNA Sequence analysis (contd)

• nested deletion strategies (contd)

– cut, close and sequence (contd)

• advantages

– fast

– simple

– efficient

• disadvantages

– limited by restriction site availability in vector and insert

– need to make a restriction map

– BAL31 mediated deletions (archaic)

• digest insert from both ends with BAL31

• repair, subclone and sequence

• advantages

– was once the only way to make progressive deletions

• disadvantages

– bidirectional

– can’t protect -> must reclone

• applications

– no longer used

– superseded by ExoIII-mediated deletion cloning

BioSci 203 lecture 22 page 16 ©copyright Bruce Blumberg 2001. All rights reserved

DNA Sequence analysis (contd)

• nested deletion strategies (contd)

– Exonuclease III-mediated deletion

BioSci 203 lecture 22 page 17 ©copyright Bruce Blumberg 2001. All rights reserved

DNA Sequence analysis (contd)

– Exonuclease III-mediated deletion (contd)

• cut with polylinker enzyme

– protect ends -

» 3’ overhang

» phosphorothioate

• cut with enzyme between first cut and the insert

– can’t leave 3’ overhang

• timed digestions with Exonuclease III

• stop reactions, blunt ends

• ligate and size select recombinants

• sequence

• advantages

– unidirectional

– processivity of enzyme gives nested deletions

BioSci 203 lecture 22 page 18 ©copyright Bruce Blumberg 2001. All rights reserved

DNA Sequence analysis (contd)

• Nested deletion strategies

– Exonuclease III-mediated deletion (contd)

• disadvantages

– need two unique restriction sites flanking insert on each side

– best used successively to get > 10kb total deletions

– may not get complete overlaps of sequences

» fill in with restriction fragments or oligos

• applications

– method of choice for moderate size sequencing projects

» cDNAs

» genomic clones

– good for closing larger gaps

BioSci 203 lecture 22 page 19 ©copyright Bruce Blumberg 2001. All rights reserved

DNA Sequence analysis (contd)

• Shotgun sequencing

– NOT invented by Craig Venter

• Messing 1981 first description of shotgun

• Sanger lab developed current methods in 1983

– approach

• blast genome into bite sized chunks

• clone these chunks

• sequence

• assemble the whole mess with a computer

– idealized shotgun strategy

– A priori difficulties

• how to get nice uniform distribution

• how to assemble fragments

• what to do about repeats?

• How to minimize sequence redundancy?

BioSci 203 lecture 22 page 20 ©copyright Bruce Blumberg 2001. All rights reserved

DNA Sequence analysis (contd)

BioSci 203 lecture 22 page 21 ©copyright Bruce Blumberg 2001. All rights reserved

DNA Sequence analysis (contd)

BioSci 203 lecture 22 page 22 ©copyright Bruce Blumberg 2001. All rights reserved

DNA Sequence analysis (contd)

BioSci 203 lecture 22 page 23 ©copyright Bruce Blumberg 2001. All rights reserved

DNA Sequence analysis (contd)

• Shotgun sequencing (contd)– How to minimize sequence redundancy?

• Best way to minimize redundancy is map before you start

– C. elegans was done this way - when the sequence was finished, it was FINISHED

» mapping took almost 10 years– mapping much too tedious and

nonprofitable for Celera» who cares about redundancy, let’s

sequence and make $$• why does redundancy matter?

– Finished sequence today costs about $0.50/base

BioSci 203 lecture 22 page 24 ©copyright Bruce Blumberg 2001. All rights reserved

DNA Sequence analysis (contd)

– Mapping by fingerprinting

– Mapping by hybridization

BioSci 203 lecture 22 page 25 ©copyright Bruce Blumberg 2001. All rights reserved

DNA Sequence analysis (contd)

• Map as you go

BioSci 203 lecture 22 page 26 ©copyright Bruce Blumberg 2001. All rights reserved

DNA Sequence analysis (contd)

– Whole genome shotgun sequencing (Celera)

• premise is that rapid generation of draft sequence is valuable

• why bother trying to clone and sequence difficult regions?

– Basically just forget regions of repetitive DNA - not cost effective

• using this approach, genome is alleged to be 90% finished

– rule of thumb is that it takes at least as long to finish the last 5% as it took to get the first 95%

• problems

– sequence may never be complete as is C. elegans

– much redundant sequence with many sparse regions and lots of gaps.

– Fragment assembly for regions of highly repetitive DNA is dubious at best

– “Finished” fly and human genomes lack more than a few already characterized genes

BioSci 203 lecture 22 page 27 ©copyright Bruce Blumberg 2001. All rights reserved

Useful software for molecular biology

BioSci 203 lecture 22 page 28 ©copyright Bruce Blumberg 2001. All rights reserved

Useful software for molecular biology (contd)

• NCBI - main information and analysis resource

– indispensable resource

BioSci 203 lecture 22 page 29 ©copyright Bruce Blumberg 2001. All rights reserved

Useful software for molecular biology (contd)

• NCBI - main information and analysis resource

BioSci 203 lecture 22 page 30 ©copyright Bruce Blumberg 2001. All rights reserved

Useful software for molecular biology (contd)

• NCBI - main information and analysis resource

BioSci 203 lecture 22 page 31 ©copyright Bruce Blumberg 2001. All rights reserved

Useful software for molecular biology (contd)

• Why pay Celera?