Adleman and computing on a surface 1Introduction 2Theoretical background Biochemistry/molecular...

Preview:

Citation preview

Adleman and computing on a surface

1 Introduction

2 Theoretical background Biochemistry/molecular biology

3 Theoretical background computer science

4 History of the field

5 Splicing systems

6 P systems

7 Hairpins

8 Detection techniques

9 Micro technology introduction

10 Microchips and fluidics

11 Self assembly

12 Regulatory networks

13 Molecular motors

14 DNA nanowires

15 Protein computers

16 DNA computing - summery

17 Presentation of essay and discussion

Course outline

Who’s who?

Tom Head

http://www.math.binghamton.edu/tom/

Areas of interest

Algebra

Computing with biomolecules

Formal representations of communication

Department of Mathematical Sciences

Binghamton University

http://www.usc.edu/dept/molecular-science/fm-adleman.htm

Areas of interest Method for Obtaining Digital Signatures and

Public-Key Cryptosystems Distinguishing Prime Numbers From Composite

Numbers The First Case of Fermat's Last Theorem Primality Testing And Two Dimensional

Abelian Varieties Over Finite Fields Molecular Computation of Solutions To

Combinatorial Problem

Leonard Adleman

Turing Award 2002

Department of Computer Science

Theoretical Computer Science College of

Computing, Georgia Tech

Richard Lipton

http://www.cc.gatech.edu/computing/Theory/theory.html

Areas of interest Algorithms and Complexity Theory Cryptography DNA Computing

Laura Landweber

http://www.princeton.edu/~lfl/

Areas of interestOrigins of Genes, Genomesthe Genetic CodeEarly Pathways of RNA EvolutionScrambled GenesRNA EditingGene ScramblingDNA Computing

Dept. of Ecology and Evolutionary Biology

Princeton University

John Reif

http://www.cs.duke.edu/~reif/

Computer ScienceDuke University

Areas of interestDNA nanostructuresMolecular ComputationEfficient AlgorithmsParallel ComputationRobotic Motion PlanningOptical Computing.

Erik Winfree

http://www.dna.caltech.edu/~winfree/

Computer Science Computation and Neural Systems Caltech,

Areas of interestDNA-based computersComputing by self-assemblyGenetic Regulatory NetworksSignal Transduction CascadesRibosomal TranslationDNA and RNA folding

MacArthur Fellow 2000

Nadrian Seeman

Department of Chemistry

New York University

Areas of interestDNA NanotechnologyMacromolecular Design and TopologyBiophysical Chemistry of

Recombinational IntermediatesDNA-Based ComputationCrystallography

http://www.nyu.edu/pages/chemistry/faculty/seeman.html

Robert Corn

http://corninfo.chem.wisc.edu/

Chemistry Department

University of Wisconsin

Areas of interest surface plasmon resonance (SPR) to monitor

biopolymer adsorption, the chemical

modification of surfaces, characterization of molecular monolayers electron transfer processes at

liquid/liquid electrochemical interfaces.

DNA computing algorithms at surfaces multilayer polyelectrolyte films for ion

transport applications.

Hagiya Masami

http://hagi.is.s.u-tokyo.ac.jp

Department of Computer Science,

University of Tokyo

Areas of interest Automated Deduction, Formal

Verification and Programming Languages Bio-Computing Hybrid Systems...

Akira Suyama

http://talent.c.u-tokyo.ac.jp/suyama/

Graduate School of Arts and Sciences,

University of Tokyo

Areas of interest SNPs Probe design DNA chips Quantitative gene expression Hybrid Systems...

John Rose

Areas of interest the DNA chip, especially Tag-Antitag

Systems Whiplash PCR, a simple autonomous DNA

computer equilibrium chemistry/statistical

thermodynamic model

http://hagi.is.s.u-tokyo.ac.jp/~johnrose/

Department of Computer Science,

University of Tokyo

Gheorghe Păun

Areas of interestFormal language theory (and applications)Combinatorics on wordsSemiotics operational research DNA Computing Membrane Computing

http://stoilow.imar.ro/~gpaun/

Institute of Mathematics of

the Romanian Academy

Grzegorz Rozenberg

http://www.wi.leidenuniv.nl/~rozenber/

Institute of Advanced Computer Science

University of Leiden

Areas of interestMolecular ComputingEvolutionary AlgorithmsNeural Networks

Areas of interestH systemsP systemsNeural Networks

Giancarlo Mauri

http://bioinformatics.bio.disco.unimib.it/

Dipartimento di Informatica,

Sistemistica e Comunicazione (DISCo)

Milano

Ehud Shapiro

Areas of interestDNA as input fuelBiological nanocomputerTuring machine-like model

http://www.weizmann.ac.il/mathusers/lbn/index.html

Computer Science and Applied Mathematics

the Weizmann Institute

Byoung-Tak Zhang

http://scai.snu.ac.kr/~btzhang/

Areas of interestEvolutionary Intelligence Neural Intelligence Molecular Intelligence Computational Learning Theory

School of Computer Science and Engineering

Seoul National University

Danny van Noort

http://bi.snu.ac.kr/~danny/

Areas of interestmicrostructure design and fabrication DNA-hybridisationinstrumentationfluorescent microscopy affinity biosensors protein chips DNA computingcell behaviour

School of Computer Science and Engineering

Seoul National University

NP complete problems

Tractable and intractable problems

NP-complete problems

The theory of NP-completeness

Classify problems as tractable or

intractable.

Problem is tractable if there exists at least

one polynomial bound algorithm that solves

it.

An algorithm is polynomial bound if its worst

case growth rate can be bound by a polynomial

p(n) in the size n of the problem

constant a is where...)( 01 kanananp kn

Classifying problems

• Problem is intractable if it is not tractable.

• All algorithms that solve the problem are not polynomial bound.

• It has a worst case growth rate f(n) which cannot be bound by a polynomial p(n) in the size n of the problem.

• For intractable problems the bounds are:

f n c nn n( ) , ,log or etc.

Intractable problems

There are many practical problems for which

no one has yet found a polynomial bound

algorithm.

Examples: traveling salesperson, 0/1

knapsack, graph coloring, bin packing etc.

Most design automation problems such as

testing and routing.

Many networks, database and graph problems.

Hard practical problems

The theory of NP-completeness enables

showing that these problems are at least as

hard as NP-complete problems

Practical implication of knowing problem is

NP-complete is that it is probably

intractable ( whether it is or not has not

been proved yet)

So any algorithm that solves it will

probably be very slow for large inputs

The theory of NP-completeness

A decision problem answers yes or no for a

given input

Examples:

Given a graph G Is there a path from s to t

of length at most k?

Does graph G contain a Hamiltonian cycle?

Given a graph G is it bipartite?

Decision problems

A Hamiltonian cycle of a graph G is a

cycle that includes each vertex of the

graph exactly once.

Problem: Given a graph G, does G have

a Hamiltonian cycle?

Decision problem: Hamiltonian cycle

P is the class of decision problems that

are polynomial bounded

Is the following problem in P?

Given a weighted graph G, is there a

spanning tree of weight at most B?

The decision versions of problems such as

shortest distance, and minimum spanning

tree belong to P

The class P

NP is the class of decision problems for

which there is a polynomial bounded

verification algorithm

It can be shown that:

all decision problems in P, and

decision problems such as traveling

salesman, knapsack, bin pack, are also in

NP

The class NP

P NP

If a problem is solvable in polynomial

time, a polynomial time verification

algorithm can easily be designed that

ignores the certificate and answers “yes”

for all inputs with the answer “yes”.

The relation between P and NP

It is not known whether P = NP.

Problems in P can be solved “quickly”

Problems in NP can be verified “quickly”.

It is easier to verify a solution than to

solve a problem.

Some researchers believe that P and NP

are not the same class.

The relation between P and NP

A problem A is NP-complete if

1. It is in NP and

2. For every other problem A’ in NP, A’ A

A problem A is NP-hard if

For every other problem A’ in NP, A’ A

NP-complete problems

Cook’s theorem

Satisfiability is NP-complete

This was the first problem shown to be NP-complete

Other problems

the decision version of knapsack,

the decision version of traveling salesman

Examples of NP-complete problems

Satisfiability problem

First, Conjunctive Normal Form (CNF)

will be defined

Then, the Satisfiability problem will

be defined

The satisfiability problem

A logical (Boolean) variable is a variable

that may be assigned the value true or false

(x, y, w and z are Boolean variables)

A literal is a logical variable or the

negation of a logical variable (x and y are literals)

A clause is a disjunction of literals

((wxy) and (xy) are clauses)

Conjunctive normal form (CNF)

A logical (Boolean) expression is in

Conjunctive Normal Form if it is a

conjunction of clauses.

The following expression is in

conjunctive normal form:

(wxy) (wyz) (xy) (wy)

Conjunctive normal form (CNF)

Is there a truth assignment to the n

variables of a logical expression in

Conjunctive Normal Form which makes the

value of the expression true?

For the answer to be yes, all clauses

must evaluate to true

Otherwise the answer is no

The satisfiability problem

x=F, y=F, w=T and z=T is a truth

assignment for:

(wxy) (wyz) (xy) (wy)

Note that if y=F then y=T

Each clause evaluates to true

The satisfiability problem

Adleman’s experiment

The 1994 experiment

DNA computer

The 1994 experiment

Basic Idea

Perform molecular biology experiment

to find solution to math problem.

The 1994 experiment

(Proposed by William Hamilton)

Given a network of nodes and directed

connections between them, is there a path

through the network that begins with the start

node and concludes with the end node visiting

each node only once (“Hamiltonian path")?

Does a Hamiltonian path exist, or not?”

Hamiltonian path

Detroit

BostonChicago

Atlanta

start city

end city

Hamiltonian path does exist

Detroit

BostonChicago

Atlanta

end city

start city

Hamiltonian path does not exist

Generation-&-Test Algorithm

Step 1 Generate random paths on the network.

Step 2 Keep only those paths that begin with

start city and conclude with end city.

Step 3 If there are N cities, keep only those

paths of length N.

Step 4 Keep only those that enter all cities at

least once.

Step 5 Any remaining paths are solutions (i.e.,

Hamiltonian paths).

Solving the Hamiltonian problem

[X] D -> B -> A

[X] B -> C -> D -> B -> A -> B

[X] A -> B -> C -> B

[X] C -> D -> B -> A

[x] A -> B -> A -> D

[O] A -> B -> C -> D

[X] A -> B -> A -> B -> C -> D

The paths

Solving the Hamiltonian problem

The total number of paths grows exponentially

as the network size increases:

(e.g.) 106 paths for N=10 cities, 1012 paths

(N=20), 10100 paths!! (N =100)

The Generation-&-Test algorithm takes

“forever”. Some sort of smart algorithm must be

devised; none has been found so far (NP-hard).

Combinatorial explosion

The key to solving the problem is using DNA to

perform the five steps of the Generation-&-

Test algorithm in parallel search, instead of

serial search.

Finding a solution with DNA

Protein that produces complementary DNA strand

A -> T, T -> A, C -> G, G -> C

Requires primer and starter

Enables DNA to reproduce

Intermezzo: DNA polymerase

The bio-nanomachine

hops onto DNA strand

slides along

reads each base

writes its complement

onto new strand

Intermezzo: DNA polymerase

Ingredients and tools needed

DNA strands that encode city names and

connections between them

Polymerases, ligase, water, salt, other

ingredients

Polymerase chain reaction (PCR) set

Gel electrophoresis tool (that filters

out non-solution strands)

Experimental set-up

Gel electrophoresis

Detroit

BostonChicago

Atlanta

start city

end city

Solving a Hamiltonian path problem

CITY DNA NAME COMPLEMENTATLANTA ACTTGCAG TGAACGTCBOSTON TCGGACTG AGCCTGAC

CHICAGO GGCTATGT CCGATACADETROIT CCGAGCAA GGCTCGTT

City coding

TGAACGTCAGCCTGACGCAGTCGG

Atlanta Boston

Atlanta -Boston

Atlanta

Boston

City coding with DNA

Detroit

BostonChicago

Atlanta

start city

end city

Atlanta-Boston Boston-Chicago

Chicago*

Chicago-Detroit

Detroit*Atlanta* Boston*

Possible paths

Detroit

BostonChicago

Atlanta

start city

end city

Boston-Atlanta Atlanta-Detroit

Detroit*Boston* Atlanta*

Possible paths

In pictures

1. In a test tube, mix the prepared DNA pieces

together (which will randomly link with each

other, forming all different paths).

2. Perform PCR with two ‘start’ and ‘end’ DNA

pieces as primers (which creates millions’

copies of DNA strands with the right start

and end).

3. Perform gel electrophoresis to identify only

those pieces of right length (e.g., N=4).

The DNA experiment

4. Use DNA ‘probe’ molecules to check whether

their paths pass through all intermediate

cities.

5. All DNA pieces that are left in the tube

should be precisely those representing

Hamiltonian paths.

If the tube contains any DNA at all, then

conclude that a Hamiltonian path exists, and

otherwise not.

When it does, the DNA sequence represents

the specific path of the solution.

The DNA experiment

Why does it work?

Enormous parallelism, with 1023 DNA pieces

working in parallel to find solution

simultaneously.

Takes less than a week (vs. thousands

years for supercomputer)

Extraordinary energy efficient

(10-10 of supercomputer energy use)

Note this is a Universal Turing machine

Summary and conclusion

Experimental set-up

Experimental set-up

CAPTURE LAYER (-R or G)

- +

CAPTURE LAYER (-R or G)

Experimental set-up

- +

CAPTURE LAYER (-R or G)

Experimental set-up

- +

CAPTURE LAYER (-R or G)

Experimental set-up

- +HOT

CAPTURE LAYER (-R or G)

Experimental set-up

Experimental set-up

Experimental set-up

Experimental set-up

DNA computing on a surface

DNA computing on surfaces

Advantages over “solution phase” chemistry

Disadvantages:

Facile purification stepsReduced interference between strandsEasily automated

Loss of information density (2D)Lower surface hybridization efficiencySlower surface enzyme kinetics

DNA computing on surfaces

DNA strands representing the set {0,1}^n are

synthesized and subsequently immobilized on

a surface in a non-addressed fashion

DNA surface model: input

A strand is comprised of

words. Each word is a

short DNA strand (16mer)

representing one or more

bits.

Word Bit

1

2

3

4

12341234...

Encoding binary information

Requirements of a “DNA code”

Success in specific hybridization between a

DNA code word and its Watson-crick complement

Few false positive signals

Virtually all designs enforce combinatorial

constraints on the code words

Applications:

Information storage, retrieval for DNA

computing

Molecular bar codes for chemical libraries

DNA word design problem

Hamming: distance between two code words

should be large

Reverse complement: distance between a

word and the reverse complement of

another word should be large

Also: frame shift, distinct sub-words,

forbidden sub-words, …

DNA word design problem

Seeman (1990): de novo design of sequences

for nucleic acid structural engineering

Brenner (1997): sorting polynucleotides

using DNA tags

Shoemaker et al. (1996): analysis of yeast

deletion mutants using a parallel molecular

bar-coding strategy

Many other examples in DNA computing

Work on DNA code design

Word design example

MARK strands in which bit j = 0 (or 1):

hybridize with Watson-Crick complements of

word containing bit j, followed by

polymerizationDESTROYUNMARK

DNA surface model: process

MARK strands in which bit j = 0 (or 1)DESTROY unmarked strands:

exonuclease degradationUNMARK

DNA surface model: process

MARK strands in which bit j = 0 (or 1):

hybridize with Watson-Crick complements of word

containing bit j, followed by polymerization

DNA surface model: process

MARK strands in which bit j = 0 (or 1)DESTROY unmarked strandsUNMARK strands:

wash in distilled water

DNA surface model: process

Detect remaining strands (if any) by

detaching strands from surface and

amplifying using PCR (polymerase

chain reaction).

DNA surface model: output

Theorem Any CNFSAT formula of size m

can be computed using O(m) mark,

unmark and destroy operations.

Theorem Any circuit of size m can be

computed using O(m) mark, unmark,

destroy, and append operations.

Computational power

Input 16 strands

Process

Output exactly those strands that satisfy

the circuit remain on the surface.

or

not

or

z

and

w y x

MARK if bit z = 1 MARK if bit w = 1 MARK if bit y = 0 DESTROY UNMARK

MARK if bit w = 0 MARK if bit y = 0 DESTROY UNMARK …

or or

not not

The satisfiability problem

(wxy) (wyz) (xy) (wy)

{0000} {0001} {0010} {0011} {0100} {0101}{0110} {0111}{1000} {1001} {1010} {1011} {1100} {1101}{1110} {1111}

4-variable SAT demo

4-variable SAT demo

4-variable SAT demo

The logic of the DNA

computation in each cycle,

leading at the end to four

types of DNA molecules

remaining on the surface.

The identity of those

molecules that correspond to

the solutions was determined

by PCR.

Solution:

S3

S7

S8

S9

4-variable SAT demo

S3: w=0, x=0, y=1, z=1

S7: w=0, x=1, y=1, z=1

S8: w=1, x=0, y=0, z=0

S9: w=1, x=0, y=0, z=1

y=1: (w V x V y)

z=1: (w V y V z)

x=0 or y=1: (x V y)

w=0: (w V y)

4-variable SAT, the answers

Synthesize; Attach

Mark

Destroy

Unmark

Readout

Cycle

4-variable SAT demo

4-variable SAT demo

Solid-phase chemistry is a promising approach

to DNA computing

DNA computing will require greatly improved

DNA surface attachment chemistries and control

of chemical and enzymatic processes

Conclusions

Recommended