49
1 Outline Last time: Molecular biology primer (sections 3.1-3.7) – PCR • Today: More basic techniques for manipulating DNA (Sec. 3.8) • Cutting into shorter fragments • Reading fragment lengths • Reading DNA sequence • Probing presence of specific fragments First algorithm design technique: exhaustive search • Application to the partial digest problem (Sec. 4.1-4.3)

1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

Embed Size (px)

Citation preview

Page 1: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

1

Outline

• Last time:– Molecular biology primer (sections 3.1-3.7)– PCR

• Today:– More basic techniques for manipulating DNA (Sec. 3.8)

• Cutting into shorter fragments• Reading fragment lengths• Reading DNA sequence• Probing presence of specific fragments

– First algorithm design technique: exhaustive search • Application to the partial digest problem (Sec. 4.1-4.3)

Page 2: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

2

Restriction Enzymes

• Discovered in the early 1970’s

– Used as a defense mechanism by bacteria to break down the DNA of attacking viruses.

– They cut the DNA into small fragments.

• Can also be used to cut the DNA of organisms.

– This allows the DNA sequence to be in a more manageable bite-size pieces.

• It is then possible using standard purification techniques to single out certain fragments and duplicate them to macroscopic quantities.

Page 3: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

3

Molecular Scissors

Molecular Cell Biology, 4th editionfig 9-10

Page 4: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

4

Recognition Sites of Restriction Enzymes

Page 5: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

5

Separating DNA by Size

• Gel Electrophoresis– DNA fragments are injected into a gel positioned in an electric field– DNA is negatively charged near neutral pH DNA molecules move

towards the positive electrode– Smaller molecules move through the gel matrix faster than

larger molecules– The gel matrix restricts random diffusion so molecules of different

lengths separate into bands

Page 6: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

6

Detecting DNA

• Autoradiography

– The DNA is radioactively labeled

– The gel is laid against a sheet of photographic film in the dark, exposing the film at the positions where the DNA is present

• Fluorescence– The gel is incubated with a solution containing the fluorescent dye

ethidium– Ethidium binds to the DNA– The DNA lights up when the gel is exposed to ultraviolet light

Page 7: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

7

Gel Electrophoresis: Example

Direction of DNA movement

Smaller fragments travel farther

Page 8: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

8

Sequencing

• Biologists can reliably find the sequence of A/C/T/G for short strings (few hundred nucleotides)

• Chain termination– Single strand template– Complementary strand synthesis blocked with small probability at

particular nucleotides– Lengths of fragments read for each class of strings

Page 9: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

9

Sequencing

• Chain termination (See animation)– PCR-like reaction

• Single primer• Complementary strand synthesis is blocked with small

probability at particular nucleotides (A/C/T/G)– Lengths of fragments read for each class of strings

A C T G----

--------

--------

--------

Page 10: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

10

DNA Microarrays

• Exploit Watson-Crick complementarity to simultaneously perform a large number of substring tests

• Used in a variety of genomic analyses– Transcription (gene expression) analysis – Single Nucleotide Polymorphism (SNP) genotyping– Genomic-based microorganism identification

• Common microarray formats involve direct hybridization between labeled DNA/RNA sample and DNA probes attached to a glass slide

Page 11: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

11

Direct Hybridization Experiment

Images courtesy of Affymetrix.

Labeled DNA/RNA sample hybridized to array of probes

Laser activation of fluorescent labels

Optical scanning used to identify

probes with complements in the

mixture

Page 12: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

12

Two-Color Technique

• Sample labeled RED• Control labeled GREEN• YELLOWYELLOW probes hybridize to both sample and control•BLACK probes hybridize to neither

Cy3Cy3Cy3

Cy5Cy5Cy5

cell type 2

cell type 1

RNA 2

RNA 1

target 1

target 2

Page 13: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

13

Restriction Maps

•A map of all restriction sites in a DNA sequence

•Can be constructed through both biological and computational methods without knowing DNA sequence

Page 14: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

14

Full Restriction Digest

• Cutting DNA at each restriction site creates

multiple restriction fragments:

• Is it possible to reconstruct the order of the fragments and the positions of the cuts?

Page 15: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

15

Full Restriction Digest: Multiple Solutions

• An alternative ordering of restriction fragments:

vs

Page 16: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

16

Partial Restriction Digest

• The sample of DNA is exposed to the restriction enzyme for only a limited amount of time to prevent it from being cut at all restriction sites

• We assume that with this method biologists can generate the set of all possible restriction fragments between every two cuts

• We assume that multiplicity of a fragment can be detected, i.e., the number of restriction fragments of the same length can be determined (e.g., by observing twice as much fluorescence intensity for a double fragment than for a single fragment)

• This set of fragment sizes is used to determine the positions of the restriction sites in the DNA sequence

Page 17: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

17

Partial Digest Example

• For the same DNA sequence as before, we would now get the following restriction fragments:

Page 18: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

18

Partial Digest Fundamentals

Defining some of the terms used in the Partial Digest Problem:

the set of n integers representing the location of all cuts in the restriction map, including the start and end

the multiset of integers representing lengths of each of the fragments produced from a partial digest

the total number of cuts

X:

n:

X:

2

n

Page 19: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

19

One More Partial Digest Example

0 2 4 7 10

0 2 4 7 10

2 2 5 8

4 3 6

7 3

10

Representation of X = {2, 2, 3, 3, 4, 5, 6, 7, 8, 10} as a two dimensional table, with elements of

X = {0, 2, 4, 7, 10}

along both the top and left side. The elements at (i, j) in the table is the value xi – xj for 1 ≤ i < j ≤ n.

Page 20: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

20

Partial Digest Problem: Formulation

Partial Digest Problem: Given all pairwise distances between points on a line, reconstruct the positions of those points

• Input: The multiset of pairwise distances L, containing integers

• Output: A set X, of n integers, such that X = L

2

n

Page 21: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

21

Partial Digest: Multiple Solutions

• It is not always possible to uniquely reconstruct a set X based only on X

• Sets {0,1,2,5,7,9,12} and {0,1,5,7,8,10,12} both produce partial digest set

{1, 1, 2, 2, 2, 3, 3, 4, 4, 5, 5, 5, 6, 7, 7, 7, 8, 9, 10, 11, 12}

Page 22: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

22

Exhaustive Search Algorithms

• Also known as brute force algorithms; examine every possible solution until you find a valid one

• (e.g., list sorting): look at all permutations of elements in the list until finding the sorted version

• Usually impractical!

Page 23: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

23

Partial Digest: Brute Force

1. Find the restriction fragment of maximum length M. M is the length of the DNA sequence.

2. For every possible solution, compute the corresponding X

3. If X is equal to the experimental partial digest L, then X is the correct restriction map

Page 24: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

24

BruteForcePDP

1. BruteForcePDP(L, n):2. M maximum element in L3. for every set of n – 2 integers 0 < x2 < … xn-1 < M4. X {0,x2,…,xn-1,M}5. Form X from X6. if X = L7. return X8. output “no solution”

Page 25: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

25

Efficiency of BruteForcePDP

• The speed of the BruteForceDPD is unfortunately O(M n-2) as it must examine all possible sets of positions.

• One way to improve the algorithm is to limit the values of xi

to only those values which occur in L

2

1

n

M

Page 26: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

26

AnotherBruteForcePDP

1. AnotherBruteForcePDP(L, n)2. M maximum element in L3. for every set of n – 2 integers 0 < x2 < … xn-1 < M from L4. X { 0,x2,…,xn-1,M }5. Form X from X6. if X = L7. return X8. output “no solution”

Page 27: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

27

Efficiency of AnotherBruteForcePDP

• It’s more efficient, but still slow • Only sets examined, but runtime is still exponential:

O(n2n-4)• If L = {2, 998, 1000} (n = 3, M = 1000), BruteForcePDP will

be extremely slow, but AnotherBruteForcePDP will be quite fast

1

||

n

L

Page 28: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

28

Branch and Bound Algorithm for PDP

1. Begin with X = {0}

2. Remove the largest element in L and place it in X

3. See if the element fits on the right or left side of the restriction map

4. When it fits, find the other lengths it creates and remove those from L

5. Go back to step 1 until L is empty

Page 29: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

29

An Example

L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }X = { 0 }

Page 30: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

30

An Example

L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }X = { 0 }

Remove 10 from L and insert it into X. We know this must bethe length of the DNA sequence because it is the largestfragment.

Page 31: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

31

An Example

L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }X = { 0, 10 }

Page 32: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

32

An Example

L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }X = { 0, 10 }

Take 8 from L and make y = 2 or 8. But since the two cases are symmetric, we can assume y = 2. We find that the distances from y to other elements at X are (y, X) = {8, 2}, so we remove {8, 2} from L and add 2 to X.

(y, X) = {|y – x1|, |y – x2|, …, |y – xn|}

for X = {x1, x2, …, xn}

Page 33: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

33

An Example

L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }X = { 0, 2, 10 }

Page 34: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

34

An Example

L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }X = { 0, 2, 10 }

Take 7 from L and make y = 7 or y = 10 – 7 = 3. We willexplore y = 7 first, so (y, X ) = {7, 5, 3}. Therefore we remove {7, 5 ,3} from L and add 7 to X.

(y, X) = {7, 5, 3} = {7 – 0, 7 – 2, 7 – 10}

Page 35: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

35

An Example

L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }X = { 0, 2, 7, 10 }

Page 36: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

36

An Example

L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }X = { 0, 2, 7, 10 }

Take 6 from L and make y = 6. Unfortunately (y, X) = {6, 4, 1 ,4}, which is not a subset of L. Therefore we won’t explore this branch.

6

Page 37: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

37

An Example

L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }X = { 0, 2, 7, 10 }

This time make y = 4. (y, X) = {4, 2, 3 ,6}, which is a subset of L so we will explore this branch. We remove {4, 2, 3 ,6} from L and add 4 to X.

Page 38: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

38

An Example

L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }X = { 0, 2, 4, 7, 10 }

Page 39: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

39

An Example

L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }X = { 0, 2, 4, 7, 10 }

L is now empty, so we have a solution, which is X.

Page 40: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

40

An Example

L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }X = { 0, 2, 7, 10 }

To find other solutions, we backtrack.

Page 41: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

41

An Example

L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }X = { 0, 2, 10 }

More backtrack.

Page 42: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

42

An Example

L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }X = { 0, 2, 10 }

This time we will explore y = 3. (y, X) = {3, 1, 7}, which isnot a subset of L, so we won’t explore this branch.

Page 43: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

43

An Example

L = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }X = { 0, 10 }

We backtracked back to the root. Therefore we have found all the solutions.

Page 44: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

44

Analyzing PartialDigest Algorithm

• Still exponential in worst case, but is very fast on average• For n different fragments, if time of PartialDigest is T(n):

– No branching case: T(n) < T(n-1) + O(n)

• Quadratic– Branching case: T(n) < 2T(n-1) + O(n)

• Exponential

Page 45: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

45

Double Digest Mapping

• Double Digest is experimentally simpler method than Partial Digest to generate data for a restriction map– Use two restriction enzymes; three full digests:

• One with only first enzyme• One with only second enzyme• One with both enzymes

• Computationally more complex to figure out than partial digest problem

Page 46: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

46

Double Digest Problem

Input: A – the set of fragment lengths from the digest with first restriction enzyme A. B – the set of fragment lengths from the digest with second restriction enzyme B. X – the set of fragment lengths from the digest with both restriction enzymes A and B.

Output: A – location of the cuts in the restriction map for the first restriction enzyme.

B – location of the cuts in the restriction map for the second restriction enzyme.

Note: X = A B and X contains 0 and t with 0 A, B and t A, B.

Page 47: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

47

Double Digest: Example

Page 48: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

48

Double Digest: Example

Without the information about X (i.e. A+B), it is impossible to solve the double digest problem as this diagram illustrates

Page 49: 1 Outline Last time: –Molecular biology primer (sections 3.1-3.7) –PCR Today: –More basic techniques for manipulating DNA (Sec. 3.8) Cutting into shorter

49

Double Digest: Multiple Solutions