View
220
Download
0
Category
Preview:
Citation preview
Algebraic Statistics for Computational Biology
Lior Pachter and Bernd Sturmfels
Ch.5: Parametric Inference R. Mihaescu
Παρουσίαση: Aγγελίνα Βιδάλη
Αλγεβρικοί & Γεωμετρικοί Αλγόριθμοι στη Μοριακή ΒιολογίαΔιδάσκων: Ι. Εμίρης
),min(: yxyx yxyx :
Convenient algebraic structure for stating dynamic programming algorithms:
the tropical semiring ),,(
Tropical arithmetic
(Convex hull)
(Minkowski sum)
)(: QPconvQP
QPQP :
The polytope agebra (d ),,
natural higher-dimensional generalization:
Inference
From Observed random variables Y1 = σ1,…,Yn = σn
we want to infer values for the Hidden random variables Χ1,…,Χm: Unknown biological data, i.e.:
• How do two sequences allign?
MAP estimation: given an observation σ1,…,σn which is the most probable explanation X1 =h1,…, Χm =hm ?
Model parameters give transition probabilities phσ :
hidden state h σ observed state
Observation: σ1,…,σn : Known biological data
Observation: σ1,…,σn
We want to compute an explanation for the observation:
the sequence h1,…,hm which yields the maximum a prosteriori probability (MAP): ),,,,,(max 1111
1mmnn
hhYYhXhXP
n
nhh
mmnn YYhXhXPp,,
1111
1
),,,,,(
We can efficiently compute the marginal probabilities:
Hidden Markov Model (HMM)
Computation of the marginal probabilities:
n
nnnnhh
hhhhhhh pppppp,,1
1222111''
,''1 11 1 1
1121111
s
h
s
hhhhhhh
s
hh
n
nnnn
n
nnpppppp
pσ has the decomposition
which gives the “Forward algorithm”.
Markov chain:Independent probabilities
Viterbi algorithm
problem of computing pσ
Tropicalization: uij=-log(p’ij) vij=-log(pij)
nnnn
nhhhhhhh
hhvuvuv
12221111 ,,min
2111
1111
1
minminmin hhhh
hhhh
hh
uvuvvnnnn
nnn
n
We can now efficiently find an explanation h1,…,hm for the observation σ1,…,σn using the recursion:
It is again the Forward algorithm.
Pair Hidden Markov Model (pHMM)
The algebraic statistical model for sequence alignment, known as the pair hidden Markov model, is the image of the map
where An,m is the set of all alignments of the sequences σ1, σ2.
• The Needleman-Wunsch algorithm for finding the shortest path in the alignment graph is the tropicalization of the pair hidden Markov model for sequence allignment.
gttta-gt--gc
gtgc
g t t t a
Example:n=5, m=4
**
The polytope propagation algorithm
• Tropical sum-product algorithm in general fashion.
f is the density function for a statistical model.
From the d monomials find the one that maximizes
Solution: • Tropicalization: wi=-logpi &• Computation in the ploytope algebra
.)(1
11
ikei
k
d
i
e pppf
),,,,,(max 11111
mmnnhh
YYhXhXPn
Density function for a statistical model: f(p1,p2)=p1
3+p12p2
2+p1p22+p1+p2
4
• Find the index j of the monomial that minimizes the function ej
.w.
2121211 4,,2,22,3min wwwwwww
•Find an explanation
•Find the index j of the monomial with maximal value
Tropicalization:
wi=-logpi
Explanations are vertices of the Newton Polytope of f
p13p1
1
f(p1,p2)=p13+p1
2p22+p1p2
2+p1+p24
we find a point for each exponent
vector of a monomial
Normal fan
• The normal fan partitions the parameter space into regions such that: the explanation(s) for all sets of parameters in a given region is given by the polytope vertex(face) associated to that region.
Parametric MAP estimation problem
• Local: given a choice of parameters determine the set of all parameters with the same MAP estimate.
• Solution: Computation of the normal cone of the Newton Polytope.
• Global: asks for a partition of the space of parameters such that any two parameters lie in the same part iff they yield the same MAP estimate.
•Solution: Computation of the normal fan of the Newton Polytope.
Recommended