72

Course outline

  • Upload
    karsen

  • View
    45

  • Download
    0

Embed Size (px)

DESCRIPTION

Course outline. More serious. Tom Head – splicing systems. There is a solid theoretical foundation for splicing as an operation on formal languages. - PowerPoint PPT Presentation

Citation preview

Page 1: Course outline
Page 2: Course outline

1 Introduction2 Theoretical background

Biochemistry/molecular biology3 Theoretical background computer science4 History of the field5 Splicing systems6 P systems7 Hairpins

8 Detection techniques

9 Micro technology introduction

10 Microchips and fluidics11 Self assembly12 Regulatory networks13 Molecular motors14 DNA nanowires15 Protein computers16 DNA computing - summery17 Presentation of essay and discussion

Course outline

Page 3: Course outline

More serious

Page 4: Course outline

There is a solid theoretical foundation for splicing as an operation on formal languages.

In biochemical terms, procedures based on splicing may have some advantages, since the DNA is used mostly in its double stranded form, and thus many problems of unintentional annealing may be avoided.

The basic model is a single tube, containing an initial population of dsDNA, several restriction enzymes, and a ligase. Mathematically this is represented as a set of strings (the initial language), a set of cutting operations, and a set of pasting operations.

It has been proved to a Universal Turing Machine.

Tom Head – splicing systems

Page 5: Course outline

These are the techniques that are common in the microbiologist's lab and can be used to program a molecular computer. DNA can be: synthezise desired strands can be created separate strands can be sorted and separated by length merge by pouring two test tubes of DNA into one to

perform union extract extract those strands containing a given

pattern melt/anneal breaking/bonding two ssDNA molecules with

complementary sequences amplify use of PCR to make copies of DNA strands cut cut DNA with restriction enzymes rejoin rejoin DNA strands with 'sticky ends' detect confirm presence or absence of DNA

Tom Head – splicing systems

Page 6: Course outline

Initial set (finite or infinite) consists of double-stranded DNA molecules

Specific classes of enzymatic activities considered-those of restriction enzymes

Recombinant behavior modeled and associated sets analyzed by new formalism called Splicing Systems

Attention focused on effect of sets of restriction enzymes and a ligase that allow DNA molecules to be cleaved and Re-associated to produce further molecules.

Tom Head – splicing systems

Page 7: Course outline

Circular DNA and Splicing Systems

DNA molecules exist not only in linear forms but also in circular forms. 

Splicing systems

Page 8: Course outline

SPLICING

LINEAR

CIRCULAR

Splicing systems

Page 9: Course outline

Linear splicing

Page 10: Course outline

…ATTGACCC…

…CAATCAGG…

G|A

AT|C

ligase…ATTG

ACCC…

…CAATCAGG…

…ATTGCAGG…

…CAATACCC…

Splicing in nature

Page 11: Course outline

V alphabet

r =u1 u2

u4u3splicing rule u1, u2, u3, u4 V*

(x, y) x, y, z, w V*

x = x1u1u2x2

y = y1u3u4y2

x1, x2, y1, y2 V*

(z, w)r

r

x1u1u4y2 = z

y1u3u2x2 = w

Splicing in DNA computing

Page 12: Course outline

= (V, T, A, R)

L() = *(A) T*

if A, R FIN then L() REG

… with permitting context

u1 u2

u4u3

C1

C2

R

if A, R FIN then L() RE

V alphabetT V terminal alphabetA V* set of stringsR splicing rules

C1, C2 V*

Extended H-system

Page 13: Course outline

tA

hAa t1hAa t1

hA

hA bs1cAtA

h1a AtA

t1

h1a

hA AtA

hA

h1abs1ct1 h1bs1cAt1

{s1}

{h1, s1}

{hA, s1}

{s1, tA}

1

2

3

4

h1a bs1ct1 hAbs1c t1

h1bs1cA tA h1bs1cAt1

1 2 3

4

h1a

hA AtA

t1

tA

3

Rotation

h1a AtAh1a AtA h1 at1h1 at1

hAa t1hAa t1 hAatA

hA AtA h1 at1

Page 14: Course outline

: (x u1u2 y, wu3u4 z)r = u1|u2 $ u3|u4 rule

(x u1 u4 z , wu3 u2 y)

x y w z

xw

z cut

paste

y

sites

Pattern recognition

u1 u2 u3 u4

u1

u2 u3

u4

x u1 zu4 w u3 u2 y

Păun’s linear splicing operation (1996)

Page 15: Course outline

Circular splicing

Page 16: Course outline

restriction enzyme 1 restriction enzyme 2

ligase enzymes

Circular splicing

Page 17: Course outline

Conjugacy relation on A*

w, w A*, w ~ w w = xy, w = yx

Example abaa, baaa, aaab, aaba are conjugates

Ao = A*

o = set of all circular words ow = [w]o , w A*

Circular languages

Page 18: Course outline

Circular language C Ao set of equivalence classes

A* A* o

L Cir(L) = {ow | w L} (circularization of L)

CL

C{w A*| ow C}= Lin(C)

(Full linearization of C)

(A linearization of C, i.e. Cir(L)=C )

Circular languages

Page 19: Course outline

FAo ={ C Ao | L A*, Cir(L) = C, L FA, FA Chomsky hierarchy}

Definition

Theorem [Head, Păun, Pixton]

C Rego Lin (C) Reg

Circular languages

Page 20: Course outline

Păun’s definition

(A= finite alphabet, I Ao initial language)

SCPA = (A, I, R) R A* | A* $ A* | A* rules

ohu1u2 , oku3u4 Ao r = u1 | u2 $ u3 | u4 R

u2hu1 u4ku3 ou2hu1 u4ku3

Circular splicing systems

Page 21: Course outline

Definition

A circular splicing language C(SCPA) (i.e. a circular language generated by a splicing system SCPA) is the smallest circular language containing I and closed under the application of the rules in R.

Circular splicing systems

Page 22: Course outline

Head’s definitionSCH = (A, I, T) T A* A* A* triples

Ao (p, x, q ), (u,x,v) T

vkux ohpx vkux q

ohpxq , okuxv

q hpx

SCPI = (A, I, R)

Ao (, ; ), (, ; ) R

oh h

oh , o h

h

Pixton’s definitionR A* A* A* rules

h

Other splicing systems(A= finite alphabet, I Ao initial language)

Page 23: Course outline

Characterize

FAo C(Fin, Fin)

C(Reg, Fin)

class of circular languages C= C(SCPA) generated by SCPA with I and R both finite sets.

Problem

Page 24: Course outline

Theorem [Păun96]

F{Rego, CFo, REo}

R +add. hyp. (symmetry, reflexivity, self-splicing)

Theorem [Pixton95-96]

R Fin+add. hyp. (symmetry, reflexivity)

C(F, Fin) F

F{Rego, CFo, REo}

C(F, Reg) FC(Rego, Fin)Rego,

Problem

Page 25: Course outline

CSo

CFo

Rego

o((aa)*b)

o(aa)* o(an bn)

I= oaa o1, R={aa | 1 $ 1 | aa} I= oab o1, R={a | b $ b | a}

Circular finite splicing languages

Page 26: Course outline
Page 27: Course outline

Circular automata

Page 28: Course outline

J. Kari and L. KariContext-free Recombinations, words, sequences, languages where computer science, biology and linguistics meet, C. Martin-Vide, V. Mitrana (Eds.). Kluwer, the Netherlands.

Finite automata for circular languages

Page 29: Course outline

Definition Finite automaton A, circular language K-accepted by A, L( A )o

K , all words wo such that A has a cycle labeled by w

K–Acceptance Circular/linear language accepted by a finite automaton A, defined as L(A) oL(A), L(A) linear language accepted by automaton A defined in the usual way

Definition A circular/linear language L *o is regular if there is a finite automaton A that accepts the circular and linear parts of L, i.e. that accepts L * and L o

Finite automata for circular languages

Page 30: Course outline

The following definition is equivalent to a definition given by Pixton:the circular language accepted by a finite automaton is a set of all words that label a loop containing at least one initial and one final state.

DefinitionGiven a finite automaton A, the circular language accepted by A, L(A)o

P is the set of all words ow such that A has a cycle labeled by w that contains at least one final state.

P-acceptance

Page 31: Course outline

The circular languages accepted by finite automaton by the following definition coincide with the regular circular languages introduced by Head.

Given a finite automation A, the circular language accepted by A, L( A )o

H is the set of all words ow such that w = u v and v u L( A )

Pixton has shown that if in addition we assume that the family of languages is closed under repetition (i.e., wn is in the language whenever w is)

H – acceptance and P – Acceptance are equivalent

H-acceptance

Page 32: Course outline

Advantages of K-acceptance

The same automaton accepts both the linear and circular components of the language

K-acceptance

Page 33: Course outline

Counting problem

Page 34: Course outline

T. Head, Circular Suggestions for DNA Computing, in: Pattern Formation in Biology, Vision and Dynamics, Eds. A.Carbone, M Gromov and P. Prusinkiewicz, World Scientific,Singapore , 2000, pp. 325-335.

J. Kari, A Cryptosystem Based on Propositional Logic, in: Machines, Languages and Complexity, 5th International Meeting of Young Computer Scientists, Czeckoslovakia, Nov. 14-18, 1988, Eds. J. Dassow and J.Kelemen, LNCS 381, Springer, 1989, pp.210-219.

Rani Siromoney, Bireswar Das, DNA Algorithm for Breaking a Propositional Logic Based Cryptosystem, Bulletin of the EATCS, Number 79, February 2003, pp.170-176.

Sources

Page 35: Course outline

Introducing CUT-DELETE-EXPAND-LIGATE (C-D-E-L) model

Combine features in Divide-Delete-Drop (D-D-D) (Leiden) and CUT-EXPAND-LIGATE (C-E-L) (Binghamton) to form CUT-DELETE-EXPAND-LIGATE (C-D-E-L) modelThis enables us to get an aqueous solution to 3SAT which is a counting problem and known to be in IP.

3SAT Defined as follows:Instance: F a propositional formula of form F = C1 C2 …Cm where Ci are clauses and i = 1, 2, …, m.Each Ci is of the form ( li1 li2 li3) where li j , j = 1, 2, 3 are literals from the set of variables {x1 , x2 , … , xn}Question What is the number of truth assignments that satisfy F?

C-D-E-L model

Page 36: Course outline

Standard double stranded DNA cloning plasmid are commercially available.

A plasmid is a circular molecule approximately 3 kb. It contains a sub-segment, MCS (multiple cloning site) of approximately 175 base pairs that can be removed using a pair of restriction enzyme sites that flank the segment.

The MCS contains pair-wise disjoint sites at which restriction enzymes act such that each produces a 5’ overhang.

Data register molecule

Page 37: Course outline

In C-D-E-L, a segment of the plasmid used is of the form

…c1s1c1…c2s2c2…cnsnncn…

Where ci are called sites, such that no other subsequence of plasmid matches with this sequence and si are called stations and i=1,…,n

In D-D-D, lengths of stations are required to be the same

However in C-D-E-L, lengths of stations all different which is fundamental in solving #3SAT

Bio-molecular operations used in C-D-E-L are similar to the operations in C-E-L

C-D-E-L model

Page 38: Course outline

x1 , … , xn the variables in F, x1 , … , xn their negations si station associated with xi

si station associatd with si

ci site associated with station si

ci site associated with station si

vi length of station associated with xi, i=1, …, n vn+j length of station associated with literal xj , j=1,…, n Choose stations in such a way that the sequence [ v1 , … , v2n ] satisfies the property k vi < vk+1 , k = 1, … , 2n-1 i=1i.e. an Super-increasing (Easy) Knapsack Sequence From sum, sub-sequence efficiently recovered.

Design

Page 39: Course outline

Solution in Cn is analyzed by gel separation

If more than one solution is present, they will be of different lengths, thus will form separate bands

By counting number of bands we count the number of satisfying assignments.

Furthermore, from lengths of satisfying assignment ,exact assignment is read.

This can be done since stations have lengths from easy knapsack sequence any subsequence of an easy knapsack sequence has different sum from the sums of other subsequences.

Solution

Page 40: Course outline

C-D-E-L model

Page 41: Course outline
Page 42: Course outline

Thus solution to 3–SAT viz. finding the number of satisfying assignments is effectively done. Moreover, reading the truth assignments is a great advantage to break the cryptosystem based on propositional logic

Solution

Page 43: Course outline

Advantage over previous method of attack

In the cryptanalytic attack proposed earlier, modifying D-D-D, it was required to execute the DNA algorithm for each bit in the crypto-text

But in the present method proposed, using C-D-E-L (combining features of C-C-C and C-E-L ) apply 3-SAT on P and read any satisfying assignment from the final solution

This gives an equivalent public key, which amounts to breaking the cryptosystem

Advantage

Page 44: Course outline

H-system Lipton[94-95a-95b] Formalization and generalization of Adleman’s approach to other NP-complete problems.

Ex H-system Circular H-system Sticker system P-system

Splicing systems so far

Page 45: Course outline

For computational strength Turing Equivalence Expansion Finiteness & Regularity More Operator Formalization

To confirm homogeneity HPP solving & AGL

Splicing systems so far

Page 46: Course outline

Molecular application

Page 47: Course outline

Separating and fusing DNA strands Lengthening of DNA Shortening DNA Cutting DNA Multiplying DNA

Operations of DNA molecules

Page 48: Course outline

Denaturation separating the single strands without breaking them

weaker hydrogen than phosphodiester bonding heat DNA (85° - 90° C)

Renaturation slowly cooling down annealing of matching, separated strands

Separating and fusing DNA strands

Page 49: Course outline

Machinery for Nucleotide Manipulation Enzymes are proteins that catalyze chemical reactions.

Enzymes are very specific. Enzymes speed up chemical reactions extremely efficiently (speedup: 1012)

Nature has created a multitude of enzymes that are useful in processing DNA.

Enzymes

Page 50: Course outline

DNA polymerase enzymes add nucleotides to a DNA molecule

Requirements single-stranded template primer, bonded to the template 3´-hydroxyl end available for extension

Note: Terminal transferase needs no primer.

Lengthening DNA

Page 51: Course outline

DNA nucleases are enzymes that degrade DNA.

DNA exonucleases cleave (remove) nucleotides one at a time from the ends of the strands

Example: Exonuclease III 3´-nuclease degrading in 3´-5´direction

Shortening DNA

Page 52: Course outline

DNA nucleases are enzymes that degrade DNA.

DNA exonucleases cleave (remove) nucleotides one at a time from the ends of the strands

Example: Bal31 removes nucleotides from both strands

Shortening DNA

Page 53: Course outline

DNA nucleases are enzymes that degrade DNA.

DNA endonucleases destroy internal phosphodiester bonds Example: S1 cuts only single strands or within single strand sections

Restriction endonucleases much more specific cut only double strands at a specific set of sites (EcoRI)

Cutting DNA

Page 54: Course outline

Amplification of a „small“ amount of a specific DNA fragment, lost in a huge amount of other pieces.

„Needle in a haystack“ Solution: PCR = Polymerase Chain Reaction devised by Karl Mullis in 1985 Nobel Prize a very efficient molecular copy machine

Multiplying DNA

Page 55: Course outline

Start with a solution containing the following ingredients:

the target DNA molecule

primers (synthetic oligo-nucleotides), complementary to the terminal sections

polymerase, heat resistant nucleotides

PCR - initialisation

Page 56: Course outline

Solution heated close to boiling temperature.

Hydrogen bonds between the double strands are separated into single strand molecules.

PCR – denaturation

Page 57: Course outline

The solution is cooled down (to about 55° C).

Primers anneal to their complementary borders.

PCR - priming

Page 58: Course outline

The solution is heated again (to about 72° C).

Polymerase will extend the primers, using nucleotides available in the solution.

Two complete strands of the target DNA molecule are produced.

PCR - extension

Page 59: Course outline

2n copies after n steps

PCR – copying

Page 60: Course outline

Measuring the Length of DNA Molecules

DNA molecules are negatively charged.

Placed in an electric field, they will move towards the positive electrode.

The negative charge is proportional to the length of the DNA molecule.

The force needed to move the molecule is proportional to its length.

A gel makes the molecules move at different speeds.

DNA molecules are invisible, and must be marked (ethidium bromide, radioactive)

Gel electrophoresis

Page 61: Course outline

Gel electrophoresis

Page 62: Course outline

reading the exact sequence of nucleotides comprising a given DNA molecule

based on the polymerase action of extending a primed single stranded template

nucleotide analogues chemically modified e.g., replace 3´-hydroxyl group (3´-OH) by 3´-hydrogen atom (3´-H)

dideoxynucleotides: - ddA, ddT, ddC, ddG Sanger method, dideoxy enzymatic method

Sequencing

Page 63: Course outline

Objective We want to sequence a single stranded molecule a.

Preparation We extend a at the 3´ end by a short (20 bp) sequence g, which will act as the W-C complement for the primer compl(g). l Usually, the primer is labelled (radioactively, or marked fluorescently)

This results in a molecule b´= 3´- ga.

Sequencing

5' ATTAGACGTCCGTGCAATGC 3'

3'ACGTTACG 5'

Page 64: Course outline

Sequencing

Page 65: Course outline

4 tubes are prepared Tube A, Tube T, Tube C, Tube G Each of them contains b molecules primers, compl(g) polymerase nucleotides A, T, C, and G. Tube A contains a limited amount of ddA. Tube T contains a limited amount of ddT. Tube C contains a limited amount of ddC. Tube G contains a limited amount of ddG.

Sequencing

Page 66: Course outline

Structure of ddTTP

Page 67: Course outline

Termination with ddTTP

Page 68: Course outline

Reaction in Tube A

The polymerase enzyme extends the primer of b´, using the nucleotides present in Tube A: ddA, A, T, C, G.

using only A, T, C, G: b´ is extended to the full duplex.

using ddA rather than A: complementing will end at the position of the ddA nucleotide.

Sequencing

Page 69: Course outline

Sequencing

Page 70: Course outline

Sequencing - stopping

Page 71: Course outline

Tube C GCCTGCAGATTA C CGGAC CGGACGTC

Tube G GCCTGCAGATTA CG CGG CGGACG

Sequencing -results

Tube A GCCTGCAGATTA CGGA CGGACGTCTA CGGACGTCTAA

Tube T GCCTGCAGATTA CGGACGT CGGACGTCT CGGACGTCTAAT

Page 72: Course outline

Sequencing – reading the results