Algebraic Statistics for Computational Biology Lior Pachter and Bernd Sturmfels Ch.5: Parametric...

Preview:

Citation preview

Algebraic Statistics for Computational Biology

Lior Pachter and Bernd Sturmfels

Ch.5: Parametric Inference R. Mihaescu

Παρουσίαση: Aγγελίνα Βιδάλη

Αλγεβρικοί & Γεωμετρικοί Αλγόριθμοι στη Μοριακή ΒιολογίαΔιδάσκων: Ι. Εμίρης

),min(: yxyx yxyx :

Convenient algebraic structure for stating dynamic programming algorithms:

the tropical semiring ),,(

Tropical arithmetic

(Convex hull)

(Minkowski sum)

)(: QPconvQP

QPQP :

The polytope agebra (d ),,

natural higher-dimensional generalization:

Inference

From Observed random variables Y1 = σ1,…,Yn = σn

we want to infer values for the Hidden random variables Χ1,…,Χm: Unknown biological data, i.e.:

• How do two sequences allign?

MAP estimation: given an observation σ1,…,σn which is the most probable explanation X1 =h1,…, Χm =hm ?

Model parameters give transition probabilities phσ :

hidden state h σ observed state

Observation: σ1,…,σn : Known biological data

Observation: σ1,…,σn

We want to compute an explanation for the observation:

the sequence h1,…,hm which yields the maximum a prosteriori probability (MAP): ),,,,,(max 1111

1mmnn

hhYYhXhXP

n

nhh

mmnn YYhXhXPp,,

1111

1

),,,,,(

We can efficiently compute the marginal probabilities:

Hidden Markov Model (HMM)

Computation of the marginal probabilities:

n

nnnnhh

hhhhhhh pppppp,,1

1222111''

,''1 11 1 1

1121111

s

h

s

hhhhhhh

s

hh

n

nnnn

n

nnpppppp

pσ has the decomposition

which gives the “Forward algorithm”.

Markov chain:Independent probabilities

Viterbi algorithm

problem of computing pσ

Tropicalization: uij=-log(p’ij) vij=-log(pij)

nnnn

nhhhhhhh

hhvuvuv

12221111 ,,min

2111

1111

1

minminmin hhhh

hhhh

hh

uvuvvnnnn

nnn

n

We can now efficiently find an explanation h1,…,hm for the observation σ1,…,σn using the recursion:

It is again the Forward algorithm.

Pair Hidden Markov Model (pHMM)

The algebraic statistical model for sequence alignment, known as the pair hidden Markov model, is the image of the map

where An,m is the set of all alignments of the sequences σ1, σ2.

• The Needleman-Wunsch algorithm for finding the shortest path in the alignment graph is the tropicalization of the pair hidden Markov model for sequence allignment.

gttta-gt--gc

gtgc

g t t t a

Example:n=5, m=4

**

The polytope propagation algorithm

• Tropical sum-product algorithm in general fashion.

f is the density function for a statistical model.

From the d monomials find the one that maximizes

Solution: • Tropicalization: wi=-logpi &• Computation in the ploytope algebra

.)(1

11

ikei

k

d

i

e pppf

),,,,,(max 11111

mmnnhh

YYhXhXPn

Density function for a statistical model: f(p1,p2)=p1

3+p12p2

2+p1p22+p1+p2

4

• Find the index j of the monomial that minimizes the function ej

.w.

2121211 4,,2,22,3min wwwwwww

•Find an explanation

•Find the index j of the monomial with maximal value

Tropicalization:

wi=-logpi

Explanations are vertices of the Newton Polytope of f

p13p1

1

f(p1,p2)=p13+p1

2p22+p1p2

2+p1+p24

we find a point for each exponent

vector of a monomial

Normal fan

• The normal fan partitions the parameter space into regions such that: the explanation(s) for all sets of parameters in a given region is given by the polytope vertex(face) associated to that region.

Parametric MAP estimation problem

• Local: given a choice of parameters determine the set of all parameters with the same MAP estimate.

• Solution: Computation of the normal cone of the Newton Polytope.

• Global: asks for a partition of the space of parameters such that any two parameters lie in the same part iff they yield the same MAP estimate.

•Solution: Computation of the normal fan of the Newton Polytope.

Recommended