Parameter Estimation in Mixtures of Truncated Exponentials · The number of exponential terms in the mixture for each interval is ﬁxed to 2, giving target density f(z) = k +a·exp(b·z)+c·exp(d·z)

Parameter Estimation inMixtures of Truncated Exponentials

Helge Langseth1 Thomas D. Nielsen2 Rafael Rumí3

Antonio Salmerón3

1Dept. of Computer and Information Science, The Norwegian University ofScience and Technology, Norway

2Dept. of Computer Science, Aalborg University, Denmark

3Dept. of Statistics and Applied Mathematics, University of Almería, Spain

PGM, September 2008

1 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

Outline

1 BackgroundMotivationMixtures of Truncated Exponentials

2 Learning MTEs from dataBackgroundMaximum likelihood estimation in MTEsConstrained optimisation and Lagrange multipliersThe Newton-Raphson methodThe initialisation procedure

3 Model selectionLocating splitpointsDetermining model complexity

4 Conclusions


Background Mixtures of Truncated Exponentials

Mixtures of Truncated Exponentials

−3 −2 −1 0 1 2 30

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

−6 −4 −2 0 2 4 60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Z

Y

Calculate P (Y = 1) in Hugin: “Illegal link”

f(z) = 1√2π

exp(

−12z2

)

P (Y = 1|z) = 11+exp(−z)



Mixtures of Truncated Exponentials

−3 −2 −1 0 1 2 30

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

−6 −4 −2 0 2 4 60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Z

Y

Calculate P (Y = 1) with MTEs: P (Y = 1) ≈ 0.4996851

f(z) =

−0.0172 + 0.931e1.27z if − 3 ≤ z < −1

0.442 − 0.0385e−1.64z if − 1 ≤ z < 0

0.442 − 0.0385e1.64z if 0 ≤ z < 1

−0.0172 + 0.9314e−1.27z if 1 ≤ z < 3

P (Y = 1|z) =

0 if z < −5

−0.0217 + 0.522e0.635z if − 5 ≤ z < 0

1.0217 − 0.522e−0.635z if 0 ≤ z ≤ 5

1 if z > 5



The MTE model

Definition (Univariate MTE potential over a continuous variable)

Let Z be a continuous variable. A function f : ΩZ 7→ R+0 is an

MTE potential over Z

1 If

f(z) = a0 +

m∑

i=1

ai exp (bi · z)

for all z ∈ ΩZ , where ai, bi are real numbers2 . . . or there is a partition of ΩZ into intervals I1, . . . ,Ik s.t.

f is defined as above on each Ij .

Generalization to arbitrary hybrid domains (Moral et al. 2001)

The definition transfers to multivariate domains containing bothcontinuous and discrete variables.


Learning MTEs from data

Outline




4 Conclusions


Learning MTEs from data Background


0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0

5

10

15

The MTE learning problem

How to find the MTE-distribution that generated this data?




The learning task involves three basic steps:1 Determine the intervals into which ΩZ will be partitioned.2 Determine the number of exponential terms in the

mixture for each interval.3 Estimate the parameters .

Simplifying assumptions

In this work we are concerned with the univariate case.For simplicity we will initially assume that:

The intervals into which ΩZ will be partitioned is known;The number of exponential terms in the mixture for eachinterval is fixed to 2, giving target density

f(z) = k + a · exp(b · z) + c · exp(d · z).




The learning task involves three basic steps:1 Determine the intervals into which ΩZ will be partitioned.2 Determine the number of exponential terms in the

mixture for each interval.3 Estimate the parameters .

Simplifying assumptions

In this work we are concerned with the univariate case.For simplicity we will initially assume that:

The intervals into which ΩZ will be partitioned is known;The number of exponential terms in the mixture for eachinterval is fixed to 2, giving target density

f(z) = k + a · exp(b · z) + c · exp(d · z).


Learning MTEs from data Maximum likelihood estimation in MTEs

Learning MTEs from data by Maximum Likelihood

Why learn MTEs using Maximum Likelihood?

Well developed core theory, incl. good asymptoticproperties under regularity conditions.ML parameters give access to a variety of modelestimation procedures:

LRT or BIC for selecting no. exponential terms;Likelihood maximisation to locate split-points.

ProblemsThe likelihood equations cannot be solved analytically.

Identifiability or parameters.


Learning MTEs from data Maximum likelihood estimation in MTEs

Initial observations

We will assume target density

f(z|θj) = kj + aj · exp(bj · z) + cj · exp(dj · z), z ∈ Ij

for interval Ij; θj = kj , aj , bj , cj , dj.

Denote by nj the no. observations from interval Ij and letN =

∑

j nj. Then the ML solution θj must satisfy

∫

z ∈ Ij

f(z|θj) dz = nj/N. (1)

Parameter independence

θk can be found independently of θl as long as Equation (1) issatisfied for all θj. We will therefore look at a single interval Ifrom now on (and drop the index j when appropriate).


Learning MTEs from data Constrained optimisation and Lagrange multipliers

Constrained optimisation

Maximize log L(θ|z) =∑

i : zi ∈ I

log L(θ|zi) =∑

i : zi ∈ I

log f(zi|θ)

Subject to∫

z ∈ If(z|θ) dz − n/N = 0,

f(e1|θ) ≥ 0,

f(e2|θ) ≥ 0.





i : zi ∈ I

log L(θ|zi) =∑

i : zi ∈ I

log f(zi|θ)

Subject to∫

z ∈ If(z|θ) dz − n/N = 0,

f(e1|θ)− s21 = 0,

f(e2|θ)− s22 = 0.





i : zi ∈ I

log L(θ|zi) =∑

i : zi ∈ I

log f(zi|θ)

Subject to∫

z ∈ If(z|θ) dz − n/N = 0,

f(e1|θ)− s21 = 0,

f(e2|θ)− s22 = 0.

Notation:

φ = [θT sT]T, ψ = [θT sT λT]T = [φTλT]T,

g0(φ) =∫

z ∈ I f(z|θ) dz − n/N ,

g1(φ) = f(e1|θ)− s21; g2(φ) = f(e2|θ)− s2

2.

Lagrange multipliers

Find the root of ∇ψ (log L(θ|z) + λTg(φ)) to solve theconstrained optimisation problem.


Learning MTEs from data The Newton-Raphson method

The Newton-Raphson method

Example: Find x s.t. h(x) = 0.




Example: Find x s.t. h(x) = 0.Initial “guess”: x = x0; approximate h(x) by its tangent in x0.




Example: Find x s.t. h(x) = 0.New “guess” x1: The point where tangent crosses abscissa.




Example: Find x s.t. h(x) = 0.Iterate using general formula xt+1 ← xt − h

′(xt)−1 · h(xt).



The Lagrange Multipliers method

Maximise likelihood given constraints

Use the multivariate Newton-Raphson method to solveA(ψ |z) ≡ ∇ψ (log L(θ|z) + λTg(φ)) = 0:

ψt+1 ← ψt − J(A(ψt|z))−1 ·A(ψt|z).

Initialisation of Newton-Raphson:

Choose θ0 “randomly” giving s0 =[

√

f(e1|θ0)√

f(e2|θ0)]T

λ0 = [1 1]T (chosen rather arbitrarily).



Example-run, Lagrange multipliers

−6 −4 −2 0 2 4 6−6

−4

−2

0

2

4

6

Likelihood of example data D = z1, . . . , zn; value of point(b0, d0) given as maxk,a,c

∏

i (k + a exp(b0 · zi) + c exp(d0 · zi) .


Learning MTEs from data The initialisation procedure

Initialisation of the Newton-Raphson method

Initialization procedure – Main idea

Instead of maximising over 5 parameters under theconstraint

∫

z ∈ If(z|θ) dz = n/N,

we iteratively maximise over pairs of parameters.

One parameter is varied freely, the other is chosen to makesure that the constraint is fulfilled .

A high-dimensional constrained optimisation problem isthus replaced by a series of “unconstrained”optimisation problems ; each in one dimension .



Initialisation algorithm

Initialisation:Choose some “random” starting values for θ, making sure thatsure that

∫

z ∈ I f(z|θ) dz = n/N .

Constant k a · exp(b · z) c · exp(d · z)




Maximise over a; compensate using k:k determined by a to make sure that

∫

z ∈ I f(z|θ) dz = n/N .

a← maxa′

∑

i : zi ∈ I

log f(zi|k′ = func(θ, a′), a′,θ).




Maximise over c; compensate using k:k determined by c to make sure that

∫

z ∈ I f(z|θ) dz = n/N .

c← maxc′

∑

i : zi ∈ I

log f(zi|k′ = func(θ, c′), c′,θ).




Maximise over b; compensate using a:a determined by b to make sure that

∫

z ∈ I f(z|θ) dz = n/N .

b← maxb′

∑

i : zi ∈ I

log f(zi|a′ = func(θ, b′), b′,θ).




Maximise over d; compensate using c:c determined by d to make sure that

∫

z ∈ I f(z|θ) dz = n/N .

d← maxd′

∑

i : zi ∈ I

log f(zi|c′ = func(θ, d′), d′,θ).




Check for convergence:

At this point all parameters have been updated at leastonce.

Calculate likelihood and check if there is a significantimprovement.

If improved then iterate again , otherwise return .



Example run (with initialisation)

−6 −4 −2 0 2 4 6−6

−4

−2

0

2

4

6


Model selection

Outline




4 Conclusions


Model selection Locating splitpoints

Model selection: Split-point for dataset

New data-set: 50 samples from the standard Normal distribution

−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2−155

−150

−145

−140

−135

−130

−125

−120

Likelihood of data using ML estimators for different split-points


Model selection Determining model complexity

Model selection: No. parameters per interval

−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Interval [−2.5200,−0.1303〉:Constant term: L(θ1|z) = −77.641.

1 exponential term: L(θ1|z) = −55.317 =⇒ p = 0.000.

2 exponential terms: L(θ1|z) = −55.314 =⇒ p = 0.996.


Model selection Determining model complexity

Model selection: No. parameters per interval

−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Interval [−0.1303, 2.2368〉:Constant term: L(θ2|z) = −77.742.

1 exponential term: L(θ2|z) = −64.490 =⇒ p = 0.000.

2 exponential terms: L(θ2|z) = −64.490 =⇒ p = 1.000.


Conclusions

Outline




4 Conclusions


Conclusions

Conclusions

We have described an efficient method for learning MLestimates of univariate MTEs.

ML estimates fairly robust ; improvement over traditional(regression-based) method substantial.ML estimates can be used for model selection :

No. exponential terms in each interval;Number of split-points, and their location.

Ongoing work: Extension to conditional distributions:

Learning parameters of conditional distributions (“solved”).

Locating split-points (difficult; some progress is made).


Documents

Parameter Estimation in Mixtures of Truncated Exponentials · The number of exponential terms in the mixture for each interval is ﬁxed to 2, giving target density f(z) = k +a·exp(b·z)+c·exp(d·z)