CS4030: Bio-Computing Revision Lecture. DNA Replication Prior to cell division, all the genetic instructions must be copied so that each new cell will

CS4030: Bio-Computing

Revision Lecture

DNA Replication

• Prior to cell division, all the genetic instructions must be “copied” so that each new cell will have a complete set

• DNA polymerase is the enzyme that copies DNA– Reads the old strand in the 3´

to 5´ direction

Over time, genes accumulate mutations Environmental factors

• Radiation

• Oxidation Mistakes in replication or

repair Deletions, Duplications Insertions Inversions Point mutations

• Codon deletion:ACG ATA GCG TAT GTA TAG CCG…– Effect depends on the protein, position, etc.

– Almost always deleterious

– Sometimes lethal

• Frame shift mutation: ACG ATA GCG TAT GTA TAG CCG… ACG ATA GCG ATG TAT AGC CG?…– Almost always lethal

Deletions

Why align sequences?

• The draft human genome is available• Automated gene finding is possible• Gene: AGTACGTATCGTATAGCGTAA

– What does it do?What does it do?

• One approach: Is there a similar gene in another species?– Align sequences with known genes– Find the gene with the “best” match

Are there other sequences like this one?

1) Huge public databases - GenBank, Swissprot, etc.

2) Sequence comparison is the most powerful and reliable method to determine evolutionary relationships between genes

3) Similarity searching is based on alignment

4) BLAST and FASTA provide rapid similarity searching

a. rapid = approximate (heuristic)

b. false + and - scores

Similarity ≠ Homology

1) 25% similarity ≥ 100 AAs is strong evidence for homology

2) Homology is an evolutionary statement which means “descent from a common ancestor” – common 3D structure– usually common function– homology is all or nothing, you cannot say

"50% homologous"

Comparing two sequences

• Point mutations, easy:ACGTCTGATACGCCGTATAGTCTATCTACGTCTGATTCGCCCTATCGTCTATCT

• Indels are difficult, must align sequences:ACGTCTGATACGCCGTATAGTCTATCTCTGATTCGCATCGTCTATCT

ACGTCTGATACGCCGTATAGTCTATCT----CTGATTCGC---ATCGTCTATCT

Scoring a sequence alignment

• Match score: +1• Mismatch score: +0

• Gap penalty: –1ACGTCTGATACGCCGTATAGTCTATCT ||||| ||| || ||||||||----CTGATTCGC---ATCGTCTATCT

• Matches: 18 × (+1)• Mismatches: 2 × 0• Gaps: 7 × (– 1)

Score = +11Score = +11

Origination and length penalties

• We want to find alignments that are evolutionarily likely.

• Which of the following alignments seems more likely to you?ACGTCTGATACGCCGTATAGTCTATCTACGTCTGAT-------ATAGTCTATCT

ACGTCTGATACGCCGTATAGTCTATCTAC-T-TGA--CG-CGT-TA-TCTATCT

• We can achieve this by penalizing more for a new gap, than for extending an existing gap

Scoring a sequence alignment (2)

• Match/mismatch score: +1/+0

• Origination/length penalty: –2/–1ACGTCTGATACGCCGTATAGTCTATCT ||||| ||| || ||||||||----CTGATTCGC---ATCGTCTATCT

• Matches: 18 × (+1)• Mismatches: 2 × 0• Origination: 2 × (–2)• Length: 7 × (–1)

Score = +7Score = +7

Scoring Similarity1) Can only score aligned sequences

2) DNA is usually scored as identical or not

3) modified scoring for gaps - single vs. multiple base gaps (gap extension)

4) AAs have varying degrees of similarity– a. # of mutations to convert one to another

– b. chemical similarity

– c. observed mutation frequencies

5) PAM matrix calculated from observed mutations in protein families

DNA Scoring Matrix

A T C G

A 1 0 0 0

T 0 1 0 0

C 0 0 1 0

G 0 0 0 1

A T C G

A 5 -4 -4 -4

T -4 5 -4 -4

C -4 -4 5 -4

G -4 -4 -4 5

A T C G

A 1 -5 -5 -1

T -5 1 -1 -5

C -5 -1 1 -5

G -1 -5 -5 1Identity BLAST Transition/Transversion

The dynamic programming concept• Suppose we are aligning:ACTCGACAGTAG

• Last position choices:

G +1 ACTCG ACAGTA

G -1 ACTC- ACAGTAG

- -1 ACTCGG ACAGTA

We can use a table

• Suppose we are aligning:A with A…

A0 -1

A -1

Needleman-Wunsch: Step 1• Each sequence along one axis• Mismatch penalty multiples in first row/column• 0 in [1,1]

Needleman-Wunsch: Step 2

• Vertical/Horiz. move: Score + (simple) gap penalty• Diagonal move: Score + match/mismatch score• Take the MAX of the three possibilities

Needleman-Wunsch: Step 2 (cont’d)

• Fill out the rest of the table likewise…

Needleman-Wunsch: Step 2 (cont’d)

• Fill out the rest of the table likewise…

The optimal alignment score is calculated in the lower-right corner

But what is the optimal alignment

• To reconstruct the optimal alignment, we must determine of where the MAX at each step came from…

A path corresponds to an alignment

• = GAP in top sequence• = GAP in left sequence• = ALIGN both positions• One path from the previous table:• Corresponding alignment (start at the end):

AC--TCGACAGTAG

Score = +2

Semi-global alignment

• Suppose we are aligning:GCGGGCG

• Which do you prefer?G-CG -GCGGGCG GGCG

• Semi-global alignment allows gaps at the ends for free.

Semi-global alignment allows gaps at the ends for free.

Initialize first row and column to all 0’s Allow free horizontal/vertical moves in last

row and column

Semi-global alignment

Local alignment

• Global alignments – score the entire alignment• Semi-global alignments – allow unscored gaps at

the beginning or end of either sequence• Local alignment – find the best matching

subsequence• CGATGAAATGGA

• This is achieved by allowing a 4th alternative at each position in the table: zero, if alternative neg.

• Smith-Waterman Algorithm (1981).

Local alignment

• Mismatch = –1 this time

CGATGAAATGGA

CBA - Artificial Immune Systems

Classical Immunity

• The purpose of the immune system is defence• Innate and acquired immunity

– Innate is the first line of defense. Germ line encoded (passed from parents) and is quite ‘static’ (but not totally static)

– Adaptive (acquired). Somatic (cellular) and is acquired by the host over the life time. Very dynamic.

– These two interact and affect each other


Multiple layers of the immune system

Phagocyte

Adaptive immune

response

Lymphocytes

Innate immune

response

Biochemical barriers

Skin

Pathogens


Innate Immunity• May take days to remove an infection, if it fails,

then the adaptive response may take over• Macrophages and neurophils are actors

– Bind to common (known) things. This knowledge has been evolved and passed from generation to generation.


Processes within the Immune System (very basically)

• Negative Selection

– Censoring of T-cells in the thymus gland of T-cells that recognise self

• Defining normal system behavior

• Clonal Selection

– Proliferation and differentiation of cells when they have recognised something

• Generalise and learn

• Self vs Non-Self


Clonal Selection


Clonal Selection


Immune Responses

Antigen Ag 1 Antigens Ag1, Ag2

Primary Response Secondary Response

Lag

Response to Ag1

Anti

body Concentration

Time

Lag

Response to Ag2

Response to Ag1

...

...

Cross-Reactive Response

...

...

Antigen Ag1 + Ag3

Response to Ag1 + Ag3

Lag


A Framework for AIS

Algorithms

Affinity

Representation

Application

Solution

AIS

Shape-Space

Binary

Integer

Real-valued

Symbolic

[De Castro and Timmis, 2002]


A Framework for AIS

Algorithms

Affinity

Representation

Application

Solution

AIS Euclidean

Manhattan

Hamming


A Framework for AIS

Algorithms

Affinity

Representation

Application

Solution

AIS

Bone Marrow Models

Clonal Selection

Negative Selection

Positive Selection

Immune Network Models

Lecture 4 CBA - Artificial Immune Systems

Shape-Space• An antibody can recognise any

antigen whose complement lies within a small surrounding region of width (the cross-reactivity threshold)

• This results in a volume ve known as the recognition region of the antibody

ve

V

S

The Representation Layer

ve

ve

[Perelson,1989]


Affinity Layer• Computationally, the degree of interaction of an antibody-antigen or

antibody-antibody can be evaluated by a distance or affinity measure• The choice of affinity measure is crucial:

• It alters the shape-space topology• It will introduce an inductive bias into the algorithm• It needs to take into account the data-set used and the problem you are

trying to solve

The Affinity Layer

Lecture 4 CBA - Artificial Immune SystemsThe Affinity Layer

Affinity

• Affinity through shape similarity. On the left, a region where all antigens present the same affinity with the given antibody. On the right, antigens in the region b have a higher affinity than those in a

Geometric region a

Antibody (Ab)

Geometric region a

Geometric region b


Hamming Shape Space

• 1 if Abi != Agi: 0 otherwise (XOR operator)

The Affinity Layer

0 0 1 1 0 0 1 1

1 1 1 0 1 1 0 1

Ab:

Ag:

1

0

1

0


Hamming Shape Space

• (a) Hamming distance

• • (b) r-contigous bits rule

The Affinity Layer

XOR :Affinity: 6

0 0 1 1 0 0 1 1

1 1 1 0 1 1 0 1

1 1 0 1 1 1 1 0

XOR :

0 0 1 1 0 0 1 1

1 1 1 0 1 1 0 1

1 1 0 1 1 1 1 0

Affinity: 4


Mutation - Binary

1 0 0 0 1 1 1 0 Original string

Mutated string

Bit to be mutated

1 0 0 0 0 1 1 0

Single-point mutation

1 0 0 0 1 1 1 0

0 0 0 0 0 1 1 0

Multi-point mutation

Original string

Mutated string

Bits to be mutated

• Single point mutation

• Multi-point mutation


Affinity Proportional Mutation

• Affinity maturation is controlled– Proportional to

antigenic affinity– (D*) = exp(-D*)– =mutation rate– D*= affinity– =control

parameter

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

D*

= 5

= 10

= 20


The Algorithms Layer• Bone Marrow models (Hightower, Oprea, Kim)• Clonal Selection

– Clonalg(De Castro), B-Cell (Kelsey)• Negative Selection

– Forrest, Dasgputa,Kim,….• Network Models

– Continuous models:Jerne,Farmer– Discrete models: RAIN (Timmis), AiNET (De Castro)

The Algorithms Layer


Clonal Selection –CLONALG1. Initialisation2. Antigenic presentation

a. Affinity evaluationb. Clonal selection and expansionc. Affinity maturationd. Metadynamics

3. Cycle



1. Initialisation2. Antigenic presentation

a. Affinity evaluationb. Clonal selection and

expansionc. Affinity maturationd. Metadynamics

3. Cycle

Clonalg

• Create a random population of individuals (P)






3. Cycle

Clonalg

• For each antigenic pattern in the data-set S do:





3. Cycle


Clonal Selection

• Present it to the population P and determine its affinity with each element of the population





3. Cycle


Clonal Selection

• Select n highest affinity elements of P

• Generate clones proportional to their affinity with the antigen

(higher affinity=more clones)



1. Initialisation2. Antigenic

presentationa. Affinity evaluationb. Clonal selection and


3. Cycle

Clonal Selection• Mutate each clone• High affinity=low mutation rate

and vice-versa• Add mutated individuals to

population P• Reselect best individual to be kept

as memory m of the antigen presented





3. Cycle


Clonal Selection

• Replace a number r of individuals with low affinity with randomly generated new ones






3. Cycle

Clonal Selection

• Repeat step 2 until a certain stopping criterion is met



Naive Application of Clonal Selection

• Generate a set of detectors capable of identifying simple digits

• Represented as a simple bitmap

€

S =s1

s2

⎡

⎣ ⎢

⎤

⎦ ⎥=

0 1 0 0 1 0 0 1 0 0 1 0

1 0 1 1 0 1 1 1 1 0 0 1

⎡

⎣ ⎢

⎤

⎦ ⎥


Representation

• Each individual is a bitstring• Use hamming distance as affinity metric

€

M =12 2 1 11 9

2 12 9 3 1

⎡

⎣ ⎢

⎤

⎦ ⎥

€


Evolution of Detectors

Clone 1 Clone 2 Clone 3

Clone 1 Clone 2 Clone 3

• Clones

• Mutated clones


Negative Selection Algorithms• Define Self as a normal pattern of activity or stable behavior of a system/process

– A collection of logically split segments (equal-size) of pattern sequence. – Represent the collection as a multiset S of strings of length l over a finite alphabet.

• Generate a set R of detectors, each of which fails to match any string in S.• Monitor new observations (of S) for changes by continually testing the detectors

matching against representatives of S. If any detector ever matches, a change ( or deviation) must have occurred in system behavior.



Illustration of NS Algorithm:

Self

Non_Self

Self

Match10111000

Don’t Match10111101

r=2



Negative Selection

• Cross-reactivity threshold = 1

€

M =12 2 1 11 9

2 12 9 3 1

⎡

⎣ ⎢

⎤

⎦ ⎥€

• Here M[1,1], M[1,4] and M[2,2] are above the threshold• Add these to Available repertoire

• Eliminate the rest.

QR Motivations• Problems with RBS

– Reasoning from First Principles– Dangers with “nearest approximation”

• Second Generation Expert Systems– Use deep knowledge – Provide explanations of reasoning process

• Commonsense reasoning– Capture how humans reason– Enable use of appropriate causality

• Model reuse– Improved ease of ES maintenance

Arithmetic Operations• Sign Algebra

+ 0

0

+

_

_

MULT

DIV

+

+_

_

000 00

+ 0

0

+

_

_

+

+_

_

0 0XXX

Aritmetic Operations (2)

+ 0

0

+

_

_

+

+ 0 _

+ 0

0

+

_

_+_ 0

+ ?

? __

?

? + +

_ _

ADD

SUB

Arithmetic Operations (3)A = B - C

where B & C both have value [+], A will be undefined

• Disambiguation– may be possible from other information– A = [+] if B > C– A = [0] if B = C– A = [-] if B < C

• Functional Relations– Y = M+(X)– Y = M-(X)

Curve Shapes

+ 0

0

+

_

_

d1d2

Transition Rules• Intermediate Value Theorem (IVT)

– States that for a continuous system, a function joining two points of opposite sign must pass through zero.

• Mean Value Theorem (MVT)– Defines the direction of change of a variable between two points.

[++] [+o] [+-]

[o+] [oo] [o-]

[-+] [-o] [- -]

Single Compartment System

plane 0f10 = k10.x1x1’ = u - f10

plane 1f10’ = k10.x1’x1’’ = u’ - f10’

plane 2f10’’ = k10.x1’’x1’’’ = u’’ - f10’’

1

u

k10.x1

Models in Morven

(define-fuzzy-model <model_name>

(short-name <short_name_of_model>)

(variables <list-of [variable_name, bounds, quantity-space]>)

(auxiliary-variables <list-of auxiliary_variable_names>)

(input <list-of [input_name, bounds, quantity-space]>)

(constraints <list-of [differential_planes (list-of constraints)]>

(print <list-of variable_names>)

)

A JMorven Modelmodel-name: single-tankshort-name: fst

NumSystemVariables: 2variable: qo range: zero p-max NumDerivatives: 1 qspaces: tanks-quantity-spacevariable: V range: zero p-max NumDerivatives: 2 qsapces: tanks-quantity-space tanks-quantity-space2

NumExogenousVariables: 1variable: qi range: zero p-max NumDerivatives: 1 qspaces: tanks-quantity-space

Constraints:NumDiffPlanes: 2

Plane: 0 NumConstraints: 2Constraint: func (dt 0 qo) (dt 0 V) NumMappings: 9Mappings:

n-max n-maxn-large n-largen-medium n-mediumn-small n-smallzero zerop-small p-smallp-medium p-mediump-large p-largep-max p-max

Constraint: sub (dt 1 V) (dt 0 qi) (dt 0 qo)

NumVarsToPrint: 3 VarsToPrint: V qi qo

A JMorven Quantity Space NumQSpaces: 2

QSpaceName: tanks-quantity-spaceNumQuantities: 9

n-max -1 -1 0 0.1n-large -0.9 -0.75 0.05 0.15n-medium -0.6 -0.4 0.1 0.1n-small -0.25 -0.15 0.1 0.15zero 0 0 0 0p-small 0.15 0.25 0.15 0.1p-medium 0.4 0.6 0.1 0.1p-large 0.75 0.9 0.15 0.05p-max 1 1 0.1 0

QSpaceName: tanks-quantity-space2NumQuantities: 5

nl-dash -1 -0.75 0 0.15ns-dash -0.6 -0.15 0.1 0.15zero 0 0 0 0ps-dash 0.15 0.6 0.15 0.1pl-dash 0.75 1 0.15 0

Possible States

state vector state vector1 + + + + 22 + - o +2 + + + o 23 + - o o3 + + + - 24 + - o -4 + + o + 25 + - - +5 + + o o 26 + - - o6 + + o - 27 + - - -7 + + - + 28 o + + +8 + + - o 29 o + + o9 + + - - 30 o + + -10 + o + + 31 o + o +11 + o + o 32 o + o o12 + o + - 33 o + o -13 + o o + 34 o + - +14 + o o o 35 o + - o15 + o o - 36 o + - -16 + o - + 37 o o + +17 + o - o 38 o o + o18 + o - - 39 o o + -19 + - + + 40 o o o +20 + - + o 41 o o o o21 + - + -

Step Response

t

V

Solution Space

21

147

30V

qi

Cascaded Systems

plane 0qx = k1.h1qo = k2.h2h1’ = qi - qxh2’ = qx - qo

plane 1qx’ = k1.h1’qo’ = k2.h2’h1’’ = qi’ - qx’h2’’ = qx’ - qo’

plane 2qx’’ = k1.h1’’qo’’ = k2.h2’’h1’’’ = qi’’ - qx’’h2’’’ = qx’’ - qo’’

Tank A

Tank B

1 2

u

k12.x1

k20.x2

h1

h2

qi

qx

qo

Cascaded Systems Envisionment

1

11

12

6 2

0 10 13 9

8

7

5

3

4

State h1 h2 qx qo

0 [0 +] [0 0] [0 +] [0 0]1 [0 +] [+ -] [0 +] [+ -]2 [+ -] [0 +] [+ -] [0 +]3 [+ -] [+ -] [+ -] [+ -]4 [+ -] [+ 0] [+ -] [+ 0]5 [+ -] [+ +] [+ -] [+ +]6 [+ 0] [0 +] [+ 0] [0 +]7 [+ 0] [+ -] [+ 0] [+ -]8 [+ 0] [+ 0] [+ 0] [+ 0]9 [+ 0] [+ +] [+ 0] [+ +]10 [+ +] [0 +] [+ +] [0 +]11 [+ +] [+ -] [+ +] [+ -]12 [+ +] [+ 0] [+ +] [+ 0]13 [+ +] [+ +] [+ +] [+ +]

Cascaded Systems Solution Space

h2

h1

h1’=0h1’=0

111

12

6 2010

13 9

8

7

5

3

4

Documents

CS4030: Bio-Computing Revision Lecture. DNA Replication Prior to cell division, all the genetic instructions must be copied so that each new cell will