Upload
christopher-madden
View
217
Download
2
Tags:
Embed Size (px)
Citation preview
CS4030: Bio-Computing
Revision Lecture
DNA Replication
• Prior to cell division, all the genetic instructions must be “copied” so that each new cell will have a complete set
• DNA polymerase is the enzyme that copies DNA– Reads the old strand in the 3´
to 5´ direction
Over time, genes accumulate mutations Environmental factors
• Radiation
• Oxidation Mistakes in replication or
repair Deletions, Duplications Insertions Inversions Point mutations
• Codon deletion:ACG ATA GCG TAT GTA TAG CCG…– Effect depends on the protein, position, etc.
– Almost always deleterious
– Sometimes lethal
• Frame shift mutation: ACG ATA GCG TAT GTA TAG CCG… ACG ATA GCG ATG TAT AGC CG?…– Almost always lethal
Deletions
Why align sequences?
• The draft human genome is available• Automated gene finding is possible• Gene: AGTACGTATCGTATAGCGTAA
– What does it do?What does it do?
• One approach: Is there a similar gene in another species?– Align sequences with known genes– Find the gene with the “best” match
Are there other sequences like this one?
1) Huge public databases - GenBank, Swissprot, etc.
2) Sequence comparison is the most powerful and reliable method to determine evolutionary relationships between genes
3) Similarity searching is based on alignment
4) BLAST and FASTA provide rapid similarity searching
a. rapid = approximate (heuristic)
b. false + and - scores
Similarity ≠ Homology
1) 25% similarity ≥ 100 AAs is strong evidence for homology
2) Homology is an evolutionary statement which means “descent from a common ancestor” – common 3D structure– usually common function– homology is all or nothing, you cannot say
"50% homologous"
Comparing two sequences
• Point mutations, easy:ACGTCTGATACGCCGTATAGTCTATCTACGTCTGATTCGCCCTATCGTCTATCT
• Indels are difficult, must align sequences:ACGTCTGATACGCCGTATAGTCTATCTCTGATTCGCATCGTCTATCT
ACGTCTGATACGCCGTATAGTCTATCT----CTGATTCGC---ATCGTCTATCT
Scoring a sequence alignment
• Match score: +1• Mismatch score: +0
• Gap penalty: –1ACGTCTGATACGCCGTATAGTCTATCT ||||| ||| || ||||||||----CTGATTCGC---ATCGTCTATCT
• Matches: 18 × (+1)• Mismatches: 2 × 0• Gaps: 7 × (– 1)
Score = +11Score = +11
Origination and length penalties
• We want to find alignments that are evolutionarily likely.
• Which of the following alignments seems more likely to you?ACGTCTGATACGCCGTATAGTCTATCTACGTCTGAT-------ATAGTCTATCT
ACGTCTGATACGCCGTATAGTCTATCTAC-T-TGA--CG-CGT-TA-TCTATCT
• We can achieve this by penalizing more for a new gap, than for extending an existing gap
Scoring a sequence alignment (2)
• Match/mismatch score: +1/+0
• Origination/length penalty: –2/–1ACGTCTGATACGCCGTATAGTCTATCT ||||| ||| || ||||||||----CTGATTCGC---ATCGTCTATCT
• Matches: 18 × (+1)• Mismatches: 2 × 0• Origination: 2 × (–2)• Length: 7 × (–1)
Score = +7Score = +7
Scoring Similarity1) Can only score aligned sequences
2) DNA is usually scored as identical or not
3) modified scoring for gaps - single vs. multiple base gaps (gap extension)
4) AAs have varying degrees of similarity– a. # of mutations to convert one to another
– b. chemical similarity
– c. observed mutation frequencies
5) PAM matrix calculated from observed mutations in protein families
DNA Scoring Matrix
A T C G
A 1 0 0 0
T 0 1 0 0
C 0 0 1 0
G 0 0 0 1
A T C G
A 5 -4 -4 -4
T -4 5 -4 -4
C -4 -4 5 -4
G -4 -4 -4 5
A T C G
A 1 -5 -5 -1
T -5 1 -1 -5
C -5 -1 1 -5
G -1 -5 -5 1Identity BLAST Transition/Transversion
The dynamic programming concept• Suppose we are aligning:ACTCGACAGTAG
• Last position choices:
G +1 ACTCG ACAGTA
G -1 ACTC- ACAGTAG
- -1 ACTCGG ACAGTA
We can use a table
• Suppose we are aligning:A with A…
A0 -1
A -1
Needleman-Wunsch: Step 1• Each sequence along one axis• Mismatch penalty multiples in first row/column• 0 in [1,1]
Needleman-Wunsch: Step 2
• Vertical/Horiz. move: Score + (simple) gap penalty• Diagonal move: Score + match/mismatch score• Take the MAX of the three possibilities
Needleman-Wunsch: Step 2 (cont’d)
• Fill out the rest of the table likewise…
Needleman-Wunsch: Step 2 (cont’d)
• Fill out the rest of the table likewise…
The optimal alignment score is calculated in the lower-right corner
But what is the optimal alignment
• To reconstruct the optimal alignment, we must determine of where the MAX at each step came from…
A path corresponds to an alignment
• = GAP in top sequence• = GAP in left sequence• = ALIGN both positions• One path from the previous table:• Corresponding alignment (start at the end):
AC--TCGACAGTAG
Score = +2
Semi-global alignment
• Suppose we are aligning:GCGGGCG
• Which do you prefer?G-CG -GCGGGCG GGCG
• Semi-global alignment allows gaps at the ends for free.
Semi-global alignment allows gaps at the ends for free.
Initialize first row and column to all 0’s Allow free horizontal/vertical moves in last
row and column
Semi-global alignment
Local alignment
• Global alignments – score the entire alignment• Semi-global alignments – allow unscored gaps at
the beginning or end of either sequence• Local alignment – find the best matching
subsequence• CGATGAAATGGA
• This is achieved by allowing a 4th alternative at each position in the table: zero, if alternative neg.
• Smith-Waterman Algorithm (1981).
Local alignment
• Mismatch = –1 this time
CGATGAAATGGA
CBA - Artificial Immune Systems
Classical Immunity
• The purpose of the immune system is defence• Innate and acquired immunity
– Innate is the first line of defense. Germ line encoded (passed from parents) and is quite ‘static’ (but not totally static)
– Adaptive (acquired). Somatic (cellular) and is acquired by the host over the life time. Very dynamic.
– These two interact and affect each other
CBA - Artificial Immune Systems
Multiple layers of the immune system
Phagocyte
Adaptive immune
response
Lymphocytes
Innate immune
response
Biochemical barriers
Skin
Pathogens
CBA - Artificial Immune Systems
Innate Immunity• May take days to remove an infection, if it fails,
then the adaptive response may take over• Macrophages and neurophils are actors
– Bind to common (known) things. This knowledge has been evolved and passed from generation to generation.
CBA - Artificial Immune Systems
Processes within the Immune System (very basically)
• Negative Selection
– Censoring of T-cells in the thymus gland of T-cells that recognise self
• Defining normal system behavior
• Clonal Selection
– Proliferation and differentiation of cells when they have recognised something
• Generalise and learn
• Self vs Non-Self
CBA - Artificial Immune Systems
Clonal Selection
CBA - Artificial Immune Systems
Clonal Selection
CBA - Artificial Immune Systems
Immune Responses
Antigen Ag 1 Antigens Ag1, Ag2
Primary Response Secondary Response
Lag
Response to Ag1
Anti
body Concentration
Time
Lag
Response to Ag2
Response to Ag1
...
...
Cross-Reactive Response
...
...
Antigen Ag1 + Ag3
Response to Ag1 + Ag3
Lag
CBA - Artificial Immune Systems
A Framework for AIS
Algorithms
Affinity
Representation
Application
Solution
AIS
Shape-Space
Binary
Integer
Real-valued
Symbolic
[De Castro and Timmis, 2002]
CBA - Artificial Immune Systems
A Framework for AIS
Algorithms
Affinity
Representation
Application
Solution
AIS Euclidean
Manhattan
Hamming
CBA - Artificial Immune Systems
A Framework for AIS
Algorithms
Affinity
Representation
Application
Solution
AIS
Bone Marrow Models
Clonal Selection
Negative Selection
Positive Selection
Immune Network Models
Lecture 4 CBA - Artificial Immune Systems
Shape-Space• An antibody can recognise any
antigen whose complement lies within a small surrounding region of width (the cross-reactivity threshold)
• This results in a volume ve known as the recognition region of the antibody
ve
V
S
The Representation Layer
ve
ve
[Perelson,1989]
Lecture 4 CBA - Artificial Immune Systems
Affinity Layer• Computationally, the degree of interaction of an antibody-antigen or
antibody-antibody can be evaluated by a distance or affinity measure• The choice of affinity measure is crucial:
• It alters the shape-space topology• It will introduce an inductive bias into the algorithm• It needs to take into account the data-set used and the problem you are
trying to solve
The Affinity Layer
Lecture 4 CBA - Artificial Immune SystemsThe Affinity Layer
Affinity
• Affinity through shape similarity. On the left, a region where all antigens present the same affinity with the given antibody. On the right, antigens in the region b have a higher affinity than those in a
Geometric region a
Antibody (Ab)
Geometric region a
Geometric region b
Lecture 4 CBA - Artificial Immune Systems
Hamming Shape Space
• 1 if Abi != Agi: 0 otherwise (XOR operator)
The Affinity Layer
0 0 1 1 0 0 1 1
1 1 1 0 1 1 0 1
Ab:
Ag:
1
0
1
0
Lecture 4 CBA - Artificial Immune Systems
Hamming Shape Space
• (a) Hamming distance
• • (b) r-contigous bits rule
The Affinity Layer
XOR :Affinity: 6
0 0 1 1 0 0 1 1
1 1 1 0 1 1 0 1
1 1 0 1 1 1 1 0
XOR :
0 0 1 1 0 0 1 1
1 1 1 0 1 1 0 1
1 1 0 1 1 1 1 0
Affinity: 4
CBA - Artificial Immune Systems
Mutation - Binary
1 0 0 0 1 1 1 0 Original string
Mutated string
Bit to be mutated
1 0 0 0 0 1 1 0
Single-point mutation
1 0 0 0 1 1 1 0
0 0 0 0 0 1 1 0
Multi-point mutation
Original string
Mutated string
Bits to be mutated
• Single point mutation
• Multi-point mutation
CBA - Artificial Immune Systems
Affinity Proportional Mutation
• Affinity maturation is controlled– Proportional to
antigenic affinity– (D*) = exp(-D*)– =mutation rate– D*= affinity– =control
parameter
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
D*
= 5
= 10
= 20
Lecture 4 CBA - Artificial Immune Systems
The Algorithms Layer• Bone Marrow models (Hightower, Oprea, Kim)• Clonal Selection
– Clonalg(De Castro), B-Cell (Kelsey)• Negative Selection
– Forrest, Dasgputa,Kim,….• Network Models
– Continuous models:Jerne,Farmer– Discrete models: RAIN (Timmis), AiNET (De Castro)
The Algorithms Layer
Lecture 4 CBA - Artificial Immune Systems
Clonal Selection –CLONALG1. Initialisation2. Antigenic presentation
a. Affinity evaluationb. Clonal selection and expansionc. Affinity maturationd. Metadynamics
3. Cycle
The Algorithms Layer
Lecture 4 CBA - Artificial Immune Systems
1. Initialisation2. Antigenic presentation
a. Affinity evaluationb. Clonal selection and
expansionc. Affinity maturationd. Metadynamics
3. Cycle
Clonalg
• Create a random population of individuals (P)
The Algorithms Layer
Lecture 4 CBA - Artificial Immune Systems
1. Initialisation2. Antigenic presentation
a. Affinity evaluationb. Clonal selection and
expansionc. Affinity maturationd. Metadynamics
3. Cycle
Clonalg
• For each antigenic pattern in the data-set S do:
The Algorithms Layer
1. Initialisation2. Antigenic presentation
a. Affinity evaluationb. Clonal selection and
expansionc. Affinity maturationd. Metadynamics
3. Cycle
Lecture 4 CBA - Artificial Immune Systems
Clonal Selection
• Present it to the population P and determine its affinity with each element of the population
The Algorithms Layer
1. Initialisation2. Antigenic presentation
a. Affinity evaluationb. Clonal selection and
expansionc. Affinity maturationd. Metadynamics
3. Cycle
Lecture 4 CBA - Artificial Immune Systems
Clonal Selection
• Select n highest affinity elements of P
• Generate clones proportional to their affinity with the antigen
(higher affinity=more clones)
The Algorithms Layer
Lecture 4 CBA - Artificial Immune Systems
1. Initialisation2. Antigenic
presentationa. Affinity evaluationb. Clonal selection and
expansionc. Affinity maturationd. Metadynamics
3. Cycle
Clonal Selection• Mutate each clone• High affinity=low mutation rate
and vice-versa• Add mutated individuals to
population P• Reselect best individual to be kept
as memory m of the antigen presented
The Algorithms Layer
1. Initialisation2. Antigenic presentation
a. Affinity evaluationb. Clonal selection and
expansionc. Affinity maturationd. Metadynamics
3. Cycle
Lecture 4 CBA - Artificial Immune Systems
Clonal Selection
• Replace a number r of individuals with low affinity with randomly generated new ones
The Algorithms Layer
Lecture 4 CBA - Artificial Immune Systems
1. Initialisation2. Antigenic presentation
a. Affinity evaluationb. Clonal selection and
expansionc. Affinity maturationd. Metadynamics
3. Cycle
Clonal Selection
• Repeat step 2 until a certain stopping criterion is met
The Algorithms Layer
CBA - Artificial Immune Systems
Naive Application of Clonal Selection
• Generate a set of detectors capable of identifying simple digits
• Represented as a simple bitmap
€
S =s1
s2
⎡
⎣ ⎢
⎤
⎦ ⎥=
0 1 0 0 1 0 0 1 0 0 1 0
1 0 1 1 0 1 1 1 1 0 0 1
⎡
⎣ ⎢
⎤
⎦ ⎥
CBA - Artificial Immune Systems
Representation
• Each individual is a bitstring• Use hamming distance as affinity metric
€
M =12 2 1 11 9
2 12 9 3 1
⎡
⎣ ⎢
⎤
⎦ ⎥
€
CBA - Artificial Immune Systems
Evolution of Detectors
Clone 1 Clone 2 Clone 3
Clone 1 Clone 2 Clone 3
• Clones
• Mutated clones
Lecture 5 CBA - Artificial Immune Systems
Negative Selection Algorithms• Define Self as a normal pattern of activity or stable behavior of a system/process
– A collection of logically split segments (equal-size) of pattern sequence. – Represent the collection as a multiset S of strings of length l over a finite alphabet.
• Generate a set R of detectors, each of which fails to match any string in S.• Monitor new observations (of S) for changes by continually testing the detectors
matching against representatives of S. If any detector ever matches, a change ( or deviation) must have occurred in system behavior.
The Algorithms Layer
Lecture 5 CBA - Artificial Immune Systems
Illustration of NS Algorithm:
Self
Non_Self
Self
Match10111000
Don’t Match10111101
r=2
The Algorithms Layer
CBA - Artificial Immune Systems
Negative Selection
• Cross-reactivity threshold = 1
€
M =12 2 1 11 9
2 12 9 3 1
⎡
⎣ ⎢
⎤
⎦ ⎥€
• Here M[1,1], M[1,4] and M[2,2] are above the threshold• Add these to Available repertoire
• Eliminate the rest.
QR Motivations• Problems with RBS
– Reasoning from First Principles– Dangers with “nearest approximation”
• Second Generation Expert Systems– Use deep knowledge – Provide explanations of reasoning process
• Commonsense reasoning– Capture how humans reason– Enable use of appropriate causality
• Model reuse– Improved ease of ES maintenance
Arithmetic Operations• Sign Algebra
+ 0
0
+
_
_
MULT
DIV
+
+_
_
000 00
+ 0
0
+
_
_
+
+_
_
0 0XXX
Aritmetic Operations (2)
+ 0
0
+
_
_
+
+ 0 _
+ 0
0
+
_
_+_ 0
+ ?
? __
?
? + +
_ _
ADD
SUB
Arithmetic Operations (3)A = B - C
where B & C both have value [+], A will be undefined
• Disambiguation– may be possible from other information– A = [+] if B > C– A = [0] if B = C– A = [-] if B < C
• Functional Relations– Y = M+(X)– Y = M-(X)
Curve Shapes
+ 0
0
+
_
_
d1d2
Transition Rules• Intermediate Value Theorem (IVT)
– States that for a continuous system, a function joining two points of opposite sign must pass through zero.
• Mean Value Theorem (MVT)– Defines the direction of change of a variable between two points.
[++] [+o] [+-]
[o+] [oo] [o-]
[-+] [-o] [- -]
Single Compartment System
plane 0f10 = k10.x1x1’ = u - f10
plane 1f10’ = k10.x1’x1’’ = u’ - f10’
plane 2f10’’ = k10.x1’’x1’’’ = u’’ - f10’’
1
u
k10.x1
Models in Morven
(define-fuzzy-model <model_name>
(short-name <short_name_of_model>)
(variables <list-of [variable_name, bounds, quantity-space]>)
(auxiliary-variables <list-of auxiliary_variable_names>)
(input <list-of [input_name, bounds, quantity-space]>)
(constraints <list-of [differential_planes (list-of constraints)]>
(print <list-of variable_names>)
)
A JMorven Modelmodel-name: single-tankshort-name: fst
NumSystemVariables: 2variable: qo range: zero p-max NumDerivatives: 1 qspaces: tanks-quantity-spacevariable: V range: zero p-max NumDerivatives: 2 qsapces: tanks-quantity-space tanks-quantity-space2
NumExogenousVariables: 1variable: qi range: zero p-max NumDerivatives: 1 qspaces: tanks-quantity-space
Constraints:NumDiffPlanes: 2
Plane: 0 NumConstraints: 2Constraint: func (dt 0 qo) (dt 0 V) NumMappings: 9Mappings:
n-max n-maxn-large n-largen-medium n-mediumn-small n-smallzero zerop-small p-smallp-medium p-mediump-large p-largep-max p-max
Constraint: sub (dt 1 V) (dt 0 qi) (dt 0 qo)
NumVarsToPrint: 3 VarsToPrint: V qi qo
A JMorven Quantity Space NumQSpaces: 2
QSpaceName: tanks-quantity-spaceNumQuantities: 9
n-max -1 -1 0 0.1n-large -0.9 -0.75 0.05 0.15n-medium -0.6 -0.4 0.1 0.1n-small -0.25 -0.15 0.1 0.15zero 0 0 0 0p-small 0.15 0.25 0.15 0.1p-medium 0.4 0.6 0.1 0.1p-large 0.75 0.9 0.15 0.05p-max 1 1 0.1 0
QSpaceName: tanks-quantity-space2NumQuantities: 5
nl-dash -1 -0.75 0 0.15ns-dash -0.6 -0.15 0.1 0.15zero 0 0 0 0ps-dash 0.15 0.6 0.15 0.1pl-dash 0.75 1 0.15 0
Possible States
state vector state vector1 + + + + 22 + - o +2 + + + o 23 + - o o3 + + + - 24 + - o -4 + + o + 25 + - - +5 + + o o 26 + - - o6 + + o - 27 + - - -7 + + - + 28 o + + +8 + + - o 29 o + + o9 + + - - 30 o + + -10 + o + + 31 o + o +11 + o + o 32 o + o o12 + o + - 33 o + o -13 + o o + 34 o + - +14 + o o o 35 o + - o15 + o o - 36 o + - -16 + o - + 37 o o + +17 + o - o 38 o o + o18 + o - - 39 o o + -19 + - + + 40 o o o +20 + - + o 41 o o o o21 + - + -
Step Response
t
V
Solution Space
21
147
30V
qi
Cascaded Systems
plane 0qx = k1.h1qo = k2.h2h1’ = qi - qxh2’ = qx - qo
plane 1qx’ = k1.h1’qo’ = k2.h2’h1’’ = qi’ - qx’h2’’ = qx’ - qo’
plane 2qx’’ = k1.h1’’qo’’ = k2.h2’’h1’’’ = qi’’ - qx’’h2’’’ = qx’’ - qo’’
Tank A
Tank B
1 2
u
k12.x1
k20.x2
h1
h2
qi
qx
qo
Cascaded Systems Envisionment
1
11
12
6 2
0 10 13 9
8
7
5
3
4
State h1 h2 qx qo
0 [0 +] [0 0] [0 +] [0 0]1 [0 +] [+ -] [0 +] [+ -]2 [+ -] [0 +] [+ -] [0 +]3 [+ -] [+ -] [+ -] [+ -]4 [+ -] [+ 0] [+ -] [+ 0]5 [+ -] [+ +] [+ -] [+ +]6 [+ 0] [0 +] [+ 0] [0 +]7 [+ 0] [+ -] [+ 0] [+ -]8 [+ 0] [+ 0] [+ 0] [+ 0]9 [+ 0] [+ +] [+ 0] [+ +]10 [+ +] [0 +] [+ +] [0 +]11 [+ +] [+ -] [+ +] [+ -]12 [+ +] [+ 0] [+ +] [+ 0]13 [+ +] [+ +] [+ +] [+ +]
Cascaded Systems Solution Space
h2
h1
h1’=0h1’=0
111
12
6 2010
13 9
8
7
5
3
4