7
iGenetics A Molecular Approach Peter J. Russell Third Edition

iGenetics Russell Third Edition iGenetics Peter J. Russell

  • Upload
    others

  • View
    382

  • Download
    23

Embed Size (px)

Citation preview

Page 1: iGenetics Russell Third Edition iGenetics Peter J. Russell

9 781292 026336

ISBN 978-1-29202-633-6

iGeneticsA Molecular Approach

Peter J. RussellThird Edition

iGe

ne

tics

Ru

ss

ell Th

ird Ed

ition

Page 2: iGenetics Russell Third Edition iGenetics Peter J. Russell

Pearson Education LimitedEdinburgh GateHarlowEssex CM20 2JEEngland and Associated Companies throughout the world

Visit us on the World Wide Web at: www.pearsoned.co.uk

© Pearson Education Limited 2014

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without either the prior written permission of the publisher or a licence permitting restricted copying in the United Kingdom issued by the Copyright Licensing Agency Ltd, Saffron House, 6–10 Kirby Street, London EC1N 8TS.

All trademarks used herein are the property of their respective owners. The use of any trademark in this text does not vest in the author or publisher any trademark ownership rights in such trademarks, nor does the use of such trademarks imply any affi liation with or endorsement of this book by such owners.

British Library Cataloguing-in-Publication DataA catalogue record for this book is available from the British Library

Printed in the United States of America

ISBN 10: 1-292-02633-2ISBN 13: 978-1-292-02633-6

ISBN 10: 1-292-02633-2ISBN 13: 978-1-292-02633-6

Page 3: iGenetics Russell Third Edition iGenetics Peter J. Russell

Genomics: The Mapping and Sequencing of Genomes

Table B

Enzyme and Recognition Sequence

SpeciesApaI

GGGCCCHindIIIAAGCTT

SacIGAGCTC

SspIAATATT

SrfIGCCCGGGC

NotIGCGGCCGC

Escherichia coli 68,000 8,000 31,000 2,000 120,000 200,000

Mycobacterium tuberculosis 2,000 18,000 4,000 32,000 10,000 4,000

Saccharomyces cerevisiae 15,000 3,000 8,000 1,000 570,000 290,000

Arabidopsis thaliana 52,000 2,000 5,000 1,000 no sites 610,000

Caenorhabditis elegans 38,000 3,000 5,000 800 1,110,000 260,000

Drosophila melanogaster 13,000 3,000 6,000 900 170,000 83,000

Mus musculus 5,000 3,000 3,000 3,000 120,000 120,000

Homo sapiens 5,000 4,000 5,000 1,000 120,000 260,000

with chemicals (e.g., alkaline conditions) and/or heat is critical to many methods used to produce and an-alyze cloned DNA. Give three examples of methods thatrely on complementary base pairing, and explain whatrole complementary base pairing plays in each of thesemethods.

3 Restriction endonucleases are naturally found in bacte-ria. What purposes do they serve?

*4 A new restriction endonuclease is isolated from a bac-terium. This enzyme cuts DNA into fragments that aver-age 4,096 base pairs long. Like many other known re-striction enzymes, the new one recognizes a sequence inDNA that has twofold rotational symmetry. From the in-formation given, how many base pairs of DNA constitutethe recognition sequence for the new enzyme?

*5 An endonuclease called AvrII (“a-v-r-two”) cuts

DNA whenever it finds the sequence

a. About how many cuts would AvrII make in thehuman genome, which contains about basepairs of DNA and in which 40% of the base pairs areG–C?

b. On average, how far apart (in base pairs) will twoAvrII sites be in the human genome?

c. In the cellular slime mold Dictyostelium discoidium,about 80% of the base pairs in regions between genesare A–T. On average, how far apart (in base pairs)will two AvrII sites be in these regions?

6 About 40% of the base pairs in human DNA are G–C.On average, how far apart (in base pairs) will the follow-ing sequences be?a. two BamHI sitesb. two EcoRI sitesc. two NotI sitesd. two HaeIII sites

*7 The average size of fragments (in base pairs) ob-served after genomic DNA from eight different species

3 109

was individually cleaved with each of six different restric-tion enzymes is shown in Table B.a. Assuming that each genome has equal amounts of A,

T, G, and C, and that on average these bases areuniformly distributed, what average fragment size isexpected following digestion with each enzyme?

b. How might you explain each of the following?i. There is a large variation in the average fragment

sizes when different genomes are cut with thesame enzyme.

ii. There is a large variation in the average fragmentsizes when the same genome is cut with differentenzymes that recognize sites having the samelength (e.g., ApaI, HindIII, SacI, and SspI).

iii. Both SrfI and NotI, which each recognize an 8-bpsite, cut the Mycobacterium genome more fre-quently than SspI and HindIII, which each recog-nize a 6-bp site.

*8 What features are required in all vectors used to propa-gate cloned DNA? What different types of cloning vectorsare there, and how do these differ from each other?

9 The plasmid pBluescript II is a plasmid cloning vectorused in E. coli. What features does it have that makes ituseful for constructing and cloning recombinant DNAmolecules? Which of these features are particularly use-ful during the sequencing of a genome?

*10 A colleague has sent you a 2-kb DNA fragment ex-cised from a plasmid cloning vector with the enzyme PstI(see Table 1 for a description of this enzyme and therestriction site it recognizes).a. List the steps you would take to clone the DNA frag-

ment into the plasmid vector pBluescript II (shown inFigure 4), and explain why each step is necessary.

b. How would you verify that you have cloned thefragment?

*11 E. coli, like all bacterial cells, has its own restric-tionendonucleases that could interfere with the propagationof foreign DNA in plasmid vectors. For example, wild-

!

.5¿-CCTAGG-3¿3¿-GGATCC-5¿

254

Page 4: iGenetics Russell Third Edition iGenetics Peter J. Russell

Genomics: The Mapping and Sequencing of Genomes

type E. coli has a gene, hsdR, that encodes a restriction en-donuclease that cleaves DNA that is not methylated at cer-tain A residues. Why is it important to inactivate this en-zyme by mutating the hsdR gene in strains of E. coli thatwill be used to propagate plasmids containing recombi-nant DNA?

12 E. coli is a commonly used host for propagating DNAsequences cloned into plasmid vectors. Wild-type E. coliturns out to be an unsuitable host, however: the plasmidvectors are “engineered,” and so is the host bacterium. Forexample, nearly all strains of E. coli used for propagating re-combinant DNA molecules carry mutations in the recAgene. The wild-type recA gene encodes a protein that is cen-tral to DNA recombination and DNA repair. Mutations inrecA eliminate general recombination in E. coli and renderE. coli sensitive to UV light. How might a recA mutationmake an E. coli cell a better host for propagating a plasmidcarrying recombinant DNA? (Hint: What type of events in-volving recombinant plasmids and the E. coli chromosomewill recA mutations prevent?) What additional advantagemight there be to using recA mutants, considering thatsome of the E. coli cells harboring a recombinant plasmidcould accidentally be released into the environment?

*13 Genomic libraries are important resources for isolat-ing genes and for studying the functional organization ofchromosomes. List the steps you would use to make a ge-nomic library of yeast in a plasmid vector. In what funda-mental way would you modify this procedure if you weremaking the library in a BAC vector?

14 Three students are working as a team to construct aplasmid library from Neurospora genomic DNA. Theywant the library to have, on average, about 4-kb inserts.Each student proposes a different strategy for construct-ing the library, as follows:

Mike: Cleave the DNA with a restriction enzymethat recognizes a 6-bp site, which appears aboutonce every 4,096 bp on average and leaves sticky,overhanging ends. Ligate this DNA into the plas-mid vector cut with the same enzyme, and trans-form the ligation products into bacterial cells.

Marisol: Partially digest the DNA with a restrictionenzyme that cuts DNA very frequently, say onceevery 256 bp, and that leaves sticky overhangingends. Select DNA that is about 4 kb in size (e.g.,purify fragments this size after the products ofthe digest are resolved by gel electrophoresis).Then, ligate this DNA to a plasmid vector cleavedwith a restriction enzyme that leaves the samesticky overhangs and transform the ligation prod-ucts into bacterial cells.

Hesham: Irradiate the DNA with ionizing radiation,which will cause double-stranded breaks in theDNA. Determine how much irradiation should beused to generate, on average, 4-kb fragments and

use this dose. Ligate linkers to the ends of theirradiated DNA, digest the linkers with a restric-tion enzyme to leave sticky overhanging ends, lig-ate the DNA to a similarly digested plasmid vec-tor, and then transform the ligation products intobacterial cells.

Which student’s strategy will ensure that the inserts arerepresentative of all of the genomic sequences? Why arethe other students’ strategies flawed?

*15 Some restriction enzymes leave sticky ends, whileothers leave blunt ends. It is more efficient to clone DNAfragments with sticky ends than DNA fragments withblunt ends. What is the best way to efficiently clone a setof DNA fragments having blunt ends?

*16 The human genome contains about bp ofDNA. How many 200-kb fragments would you have toclone into a BAC library to have a 90% probability of in-cluding a particular sequence?

17 A biochemist studies a protein with antifreeze proper-ties that he found in an Antarctic fish. After determiningpart of the protein’s amino acid sequence, he decides hewould like to obtain the DNA sequence of its gene. He hasno experience in genome analysis and mistakenly thinkshe needs to sequence the entire genome of the fish to ob-tain this information. When he asks a more knowledge-able colleague about how to sequence the fish genome,she describes the whole-genome shotgun approach andthe need to obtain about 7-fold coverage. The biochemistdecides that this approach provides far more informationthan he needs and so embarks on an alternate approachhe thinks will be faster. He decides to sequence individualclones chosen at random from a library made with ge-nomic DNA from the Antarctic fish. After sequencing theinsert of a clone, he will analyze it to see if it contains anORF with the sequence of amino acids he knows are pre-sent in the antifreeze protein. If it does, he will have foundwhat he wants and will not sequence any additionalclones. If it does not, he plans to keep obtaining and ana-lyzing the sequences of individual clones sequentiallyuntil he finds a clone that has the sequence of interest. Hethinks this approach will let him sequence fewer clonesand be faster than the whole-genome shotgun approach.

He must decide which vector to use in building hisgenomic library. He can construct a library made in thepBluescript II vector with inserts that are, on average, 7kb, a library made in the vector pBeloBAC11 with insertsthat are, on average, 200 kb, and a library made in a YACvector with inserts that are, on average, 1 Mb. He as-sumes that any library he constructs will have an equallygood representation of the base pairs in a haploidcopy of the fish genome, that the antifreeze gene is lessthan 2 kb in size, and that (somehow) he can easily ob-tain the sequence of the DNA inserted into a clone.a. Given the biochemist’s assumptions, what is the

chance that he will find the antifreeze gene if he

2 109

3 109

!

!

255

Page 5: iGenetics Russell Third Edition iGenetics Peter J. Russell

Genomics: The Mapping and Sequencing of Genomes

sequences the insert of just one clone from eachlibrary? Based on this information, which libraryshould he use if he wants to sequence the fewestnumber of clones?

b. When he tries to sequence the insert of the first clonehe picks from the library by a calleague suggested by acolleague in (a), he realizes that he does not enjoy thistype of lab work. So, he hires a technician with experi-ence in genomics, assigns the project to her, and goes toAntarctica to catch more fish. He tells her to sequencethe inserts of enough clones to be 95% certain of ob-taining at least one insert containing the antifreeze geneand says he will analyze all of the sequence data for thepresence of the antifreeze gene after he returns. Howmany clones should she sequence to satisfy this re-quirement if he constructed the genomic library in aplasmid vector? a BAC vector? a YAC vector?

c. What advantages and disadvantages does each of thedifferent vectors have for constructing libraries withcloned genome DNA?

d. Suppose the Antarctic fish has a very AT-rich genomeand the biochemist propagated the genomic libraryusing E. coli. Will the library be representative of allthe sequences in the genome of the fish?

*18 When Celera Genomics sequenced the humangenome, they obtained 13,543,099 reads of plasmidshaving an average insert size of 1,951 bp, and10,894,467 reads of plasmids having an average insertsize of 10,800 bp.a. Dideoxy sequencing provides only about 500–550 nu-

cleotides of sequence. About how many nucleotides ofsequence did cetera obtain from sequencing these twoplasmid libraries? To what fold coverage does thisamount of sequence information correspond?

b. Why did they sequence plasmids from two librarieswith different-sized inserts?

c. They sequenced only the ends of each insert. Howdid they determine the sequence lying between thesequenced ends?

*19

a. What features of pBluescript II facilitate obtaining thesequence at the ends of an insert?

b. Devise a strategy to obtain the entire sequence of a 7-kb insert in pBluescript II.

c. Devise a strategy to obtain the entire sequence of a200-kb insert in pBeloBAC11.

20 Explain how the whole-genome shotgun approach tosequencing a genome differs from the biochemist’s ap-proach described in Question 8(c). What informationdoes it provide that the biochemist’s approach does not?What does it mean to obtain 7-fold coverage, and whydid his colleague advise him to do this?

*21 In a sequencing reaction using dideoxynucleotidesthat are labeled with different fluorescent dyes, the DNA

chains produced by the reaction are separated by sizeusing capillary gel electrophoresis and then detected bya laser eye as they exit the capillary. A computer thenconverts the differently colored fluorescent peaks into apseudocolored trace. Suppose green is used for A, blackfor G, red for T, and blue for C. What pattern of peaks doyou expect to see on a sequencing trace if you carry outa dideoxy sequencing reaction after the primer

is annealed to the following single-stranded DNA fragment?

22 How does pyrosequencing differ from dideoxy chain-termination sequencing? What advantages does it havefor large-scale sequencing projects?

23 Do all SNPs lead to an alteration in phenotype? Ex-plain why or why not.

24 Researchers at Perlegen Sciences sought to identifytag SNPs on human chromosome 21. After determiningthe genotypes at 24,047 common SNPs in 20 hybrid celllines containing a single, different human chromosome21, they used computerized algorithms to identify hap-lotypes containing between 2 and 114 SNPs that coverthe entire chromosome. A total of 2,783 tag SNPS wereselected from SNPs within these blocks.a. What is a SNP marker?b. How do haplotypes arise in members of a population?c. What is a hapmap?d. What is a tag SNP?e. What advantages were there for the researchers to use

hybrid cell lines instead of genomic DNA from 20 dif-ferent individuals?

f. The 20 individuals whose chromosome 21 was usedin this analysis were unrelated and had different eth-nic origins. Do you expect the haplotypes and num-ber of tag SNPs to differ ifi. the cell lines were established from blood samplesdrawn at a large family reunion.ii. the cell lines were established from unrelated indi-viduals, but their ancestors originated in the same ge-ographical region.

*25 A set of hybrid cell lines containing a single copy ofthe same human chromosome from 10 different individu-als was genotyped for 26 SNPs, A through Z. The SNPsare present on the chromosome in the order A, B, C, . . .Z. Table C lists the SNP alleles present in each cell line.State which SNPs can serve as tag SNPs, and which hap-lotypes they identify. What is the minimum number oftag SNPs needed to differentiate between the haplotypespresent on this chromosome?

26 Some features that we commonly associate with racialidentity, such as skin pigmentation, hair shape, and facialmorphology, have a complex genetic basis. However, itturns out that these features are not representative of the

3¿-GATCCAAGTCTACGTATAGGCC-5¿

5¿-CTAGG-3¿

256

Page 6: iGenetics Russell Third Edition iGenetics Peter J. Russell

genetic differences between racial groups—individualsassigned to different racial categories share many moreDNA polymorphisms than not—supporting the con-tention that race is a social and not a biological construct.How could you use DNA chips to quantify the percentageof SNPs that are shared between individuals assigned todifferent racial groups?

*27 Mutations in the dystrophin gene can lead toDuchenne muscular dystrophy. The dystrophin gene isamong the largest known: it has a primary transcript thatspans 2.5 Mb, and it produces a mature mRNA that isabout 14 kb. Many different mutations in the dystrophingene have been identified. What steps would you take ifyou wanted to use a DNA microarray to identify the spe-cific dystrophin gene mutation present in a patient withDuchenne muscular dystrophy?

28 Three of the steps in the analysis of a genome’s sequenceare assembly, finishing, and annotation. What is involved ineach step, and how do they differ from each other?

29 What is a cDNA library, and from what cellular mate-rial is it derived? How is a cDNA synthesized, and how

Genomics: The Mapping and Sequencing of Genomes

do the steps used to clone a cDNA differ from the stepsused to clone genomic DNA? How are cDNA sequencesused to help annotation of a sequenced genome?

*30 Eukaryotic genomes differ in their repetitive DNAcontent. For example, consider the typical euchromatic50-kb segment of human DNA that contains the human T-cell receptor. About 40% of it is composed of variousgenome-wide repeats, about 10% encodes three genes(with introns), and about 8% is taken up by a pseudo-gene. Compare this to the typical 50-kb segment of yeastDNA containing the HIS4 gene. There, only about 12% iscomposed of a genome-wide repeat, and about 70% en-codes genes (without introns). The remaining sequencesin each case are untranscribed and either contain regula-tory signals or have no discernible information. Whereassome repetitive sequences can be interspersed throughoutgene-containing euchromatic regions, others are abun-dant near centromeres. What problems do these repetitivesequences pose for sequencing eukaryotic genomes?When can these problems be overcome, and how?

31 What is the difference between a gene and an ORF?Explain whether all ORFs correspond to a true gene, andif they do not, what challenges this poses for genome an-notation.

*32 Once a genomic region is sequenced, computerizedalgorithms can be used to scan the sequence to identifypotential ORFs.a. Devise a strategy to identify potential prokaryotic

ORFs by listing features accessible by an algorithmchecking for ORFs.

b. Why does the presence of introns within transcribedeukaryotic sequences preclude direct application ofthis strategy to eukaryotic sequences?

c. The average length of exons in humans is about100–200 bp, while the length of introns can rangefrom about 100 to many thousands of base pairs. Whatchallenges do these findings pose for identifying exonsin uncharacterized regions of the human genome?

d. How might you modify your strategy to overcomesome of the problems posed by the presence of in-trons in transcribed eukaryotic sequences?

33 Annotation of genomic sequences makes them muchmore useful to researchers. What features should be in-cluded in an annotation, and in what different ways canthey be depicted? For some examples of current annota-tions in databases, see the following websites:

http://www.yeastgenome.org/

http://flybase.org (Drosophila)

http://www.tigr.org/tdb/e2k1/ath1/ (Arabidopsis)

http://www.ncbi.nlm.nih.gov/genome/guide/human/(humans)

http://genome.ucsc.edu/cgi-bin/hgGateway (humans)

http://www.h-invitational.jp/

b

Cell Line

1 2 3 4 5 6 7 8 9 10

A1 A1 A2 A3 A1 A3 A2 A3 A1 A2

B1 B1 B2 B3 B2 B3 B2 B3 B1 B2

C3 C3 C1 C2 C1 C2 C1 C2 C3 C1

D4 D4 D3 D2 D1 D2 D3 D2 D4 D3

E1 E1 E2 E2 E3 E2 E2 E2 E1 E2

F2 F1 F2 F2 F2 F1 F2 F2 F2 F2

G3 G2 G3 G3 G1 G2 G1 G3 G1 G3

H1 H1 H1 H1 H2 H1 H2 H1 H2 H1

I3 I1 I3 I3 I2 I1 I2 I3 I2 I3

J2 J1 J2 J2 J2 J1 J2 J2 J2 J2

K1 K1 K1 K1 K2 K1 K2 K1 K1 K1

L2 L1 L2 L2 L1 L1 L1 L2 L2 L2

M1 M1 M2 M1 M1 M2 M2 M1 M2 M1

N2 N2 N1 N2 N2 N1 N1 N2 N1 N2

O1 O1 O1 O1 O1 O2 O1 O1 O1 O2

P2 P1 P2 P1 P2 P1 P1 P1 P2 P1

Q2 Q2 Q2 Q2 Q2 Q1 Q2 Q2 Q2 Q1

R3 R1 R3 R1 R3 R2 R1 R1 R3 R2

S1 S2 S1 S2 S1 S1 S2 S2 S1 S1

T1 T1 T1 T1 T1 T1 T1 T1 T1 T1

U2 U1 U2 U1 U2 U2 U1 U1 U2 U2

V2 V2 V2 V2 V2 V2 V2 V2 V2 V2

W2 W3 W1 W2 W1 W3 W1 W1 W3 W1

X1 X2 X1 X1 X3 X2 X3 X1 X2 X3

Y2 Y1 Y4 Y2 Y3 Y1 Y3 Y4 Y1 Y3

Z1 Z1 Z2 Z1 Z2 Z1 Z2 Z2 Z1 Z2

Table C

257

Page 7: iGenetics Russell Third Edition iGenetics Peter J. Russell

*34 One powerful approach to annotating genes is tocompare the structures of cDNA copies of mRNAs to thegenomic sequences that encode them. Indeed, a large col-laboration involving 68 research teams analyzed 41,118full-length cDNAs to annotate the structure of 21,037human genes (see http://www.h-invitational.jp/).a. What types of information can be obtained by com-

paring the structures of cDNAs with genomic DNA?b. During the synthesis of cDNA (see Figure 15), reverse

transcriptase may not always copy the entire length ofthe mRNA and so a cDNA that is not full-length canbe generated. Why is it desirable, when possible, touse full-length cDNAs in these analyses?

c. The research teams characterized the number of lociper Mb of DNA for each chromosome. Among the au-tosomes, chromosome 19 had the highest ratio of 19loci per Mb while chromosome 13 had the lowestratio of 3.5 loci per Mb. Among the sex chromosomes,the X had 4.2 loci per Mb while the Y had only 0.6 lociper Mb. What does this tell you about the distributionof genes within the human genome? How can thesedata be reconciled with the idea that chromosomeshave gene-rich regions as well as gene deserts?

d. When the research teams completed their initial analy-sis, they were able to map 40,140 cDNAs to the avail-able human genome sequence. Another 978 cDNAscould not be mapped. Of these 978 cDNAs, 907cDNAs could be roughly mapped to the mousegenome. Why might some (human) cDNAs be unableto be mapped to the human genome sequence that wasavailable at the time although they could be mapped tothe mouse genome sequence? (Hint: Consider whereerrors and limited information might exist.)

*35 How has genomic analysis provided evidence thatArchaea is a branch of life distinct from Bacteria andEukarya?

36 The genomes of many different organisms, includingbacteria, rice, and dogs, have been sequenced. Choosethree phylogenetically diverse organisms. Compare therationales for sequencing their genomes, and describewhat we have learned from sequencing each genome.

37 In which type of organisms does gene number appearto be related to genome size? Explain why this is not thecase in all organisms.

38 The C-value paradox states that there is no obviousrelationship between an organism’s haploid DNA contentand its organizational and structural complexity. Discuss,citing data from the genome sequencing, whether there isalso a gene-number paradox or a gene-density paradox.

39 In the United States, 3–5% of public funds used tosupport the Human Genome Project were devoted to re-search to address its ethical, legal, social, and policy impli-cations. Some of the results are described in the websitehttp://www.ornl.gov/sci/techresources/Human_Genome/elsi/elsi.shtml. After exploring this website, answer the following questions.a. Summarize the main ethical, legal, social, and policy

issues associated with the human genome project.b. Why is legislation necessary to protect an individ-

ual’s genetic privacy? What such legislation currentlyexists?

c. What are the pros and cons of gene testing?d. Both presymptomatic and symptomatic individuals

are subject to gene testing for an inherited disease.How are gene tests used in each situation, and howdo the concerns about using gene testing differ inthese situations?

e. Are laboratories that conduct genetic testing regu-lated by law?

Genomics: The Mapping and Sequencing of Genomes

Solutions to Selected Questions and Problems2 Examples of methods that utilize the hydrogen bondingin complementary base pairing include: (1) the binding of com-plementary sticky ends present in a cloning vector and a DNAfragment prior to their ligation by DNA ligase; (2) the anneal-ing of a labeled nucleic acid to a complementary single-stranded DNA fragment on a microarray; (3) the annealing ofan oligo(dT) primer to a poly(A) tail during the synthesis ofcDNA from mRNA; and (4) the annealing of a primer to a tem-plate during a DNA sequencing reaction. In each case, basepairing allows for nucleotides to interact in a sequence-specificmanner essential for the procedure’s success. For example, thebinding of a primer to a template at the start of a DNA sequenc-ing reaction requires complementary base pairing between thesequences in the primer and the template, which in turn defineswhere the DNA sequencing reaction will start.4 The average length of the fragments produced indicateshow often, on average, the restriction site appears. If the DNA is

composed of equal amounts of A, T, C, and G, the chance offinding one specific base pair (A–T, T–A, G–C, or C–G) at aparticular site is The chance of finding two specific basepairs at a site is In general, the chance of finding nspecific base pairs at a site is Here, so theenzyme recognizes a 6-bp site.5

b.

c.

1/4,096 (1/4)6,(1/4)n.(1/4)2.

1/4.

a. Since 40% of the genome is composed of G–Cpairs, and fore, A with base pairs will have about different groups of 6-bp sequences. Thus, the number of sites is

432,000.bp be-

tween sites.so two

AvrII sites are expected to be about bpapart.

1/0.000016=62,500P(CCTAGG)=(0.10)4

!(0.40)2=0.000016,

3!109 bp/432,000 sites=1/0.000144=6,944(3!109)=

(0.000144)!

3!1093!109

P(CCTAGG)=(0.20)4!(0.30)2

=0.000144.P(G)=P(C)=0.20

=

258