34
Parameter Estimation in Mixtures of Truncated Exponentials Helge Langseth 1 Thomas D. Nielsen 2 Rafael Rumí 3 Antonio Salmerón 3 1 Dept. of Computer and Information Science, The Norwegian University of Science and Technology, Norway 2 Dept. of Computer Science, Aalborg University, Denmark 3 Dept. of Statistics and Applied Mathematics, University of Almería, Spain PGM, September 2008 1 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

Parameter Estimation in Mixtures of Truncated Exponentials · The number of exponential terms in the mixture for each interval is fixed to 2, giving target density f(z) = k +a·exp(b·z)+c·exp(d·z)

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Parameter Estimation in Mixtures of Truncated Exponentials · The number of exponential terms in the mixture for each interval is fixed to 2, giving target density f(z) = k +a·exp(b·z)+c·exp(d·z)

Parameter Estimation inMixtures of Truncated Exponentials

Helge Langseth1 Thomas D. Nielsen2 Rafael Rumí3

Antonio Salmerón3

1Dept. of Computer and Information Science, The Norwegian University ofScience and Technology, Norway

2Dept. of Computer Science, Aalborg University, Denmark

3Dept. of Statistics and Applied Mathematics, University of Almería, Spain

PGM, September 2008

1 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

Page 2: Parameter Estimation in Mixtures of Truncated Exponentials · The number of exponential terms in the mixture for each interval is fixed to 2, giving target density f(z) = k +a·exp(b·z)+c·exp(d·z)

Outline

1 BackgroundMotivationMixtures of Truncated Exponentials

2 Learning MTEs from dataBackgroundMaximum likelihood estimation in MTEsConstrained optimisation and Lagrange multipliersThe Newton-Raphson methodThe initialisation procedure

3 Model selectionLocating splitpointsDetermining model complexity

4 Conclusions

2 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

Page 3: Parameter Estimation in Mixtures of Truncated Exponentials · The number of exponential terms in the mixture for each interval is fixed to 2, giving target density f(z) = k +a·exp(b·z)+c·exp(d·z)

Background Mixtures of Truncated Exponentials

Mixtures of Truncated Exponentials

−3 −2 −1 0 1 2 30

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

−6 −4 −2 0 2 4 60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Z

Y

Calculate P (Y = 1) in Hugin: “Illegal link”

f(z) = 1√2π

exp(

−12z2

)

P (Y = 1|z) = 11+exp(−z)

3 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

Page 4: Parameter Estimation in Mixtures of Truncated Exponentials · The number of exponential terms in the mixture for each interval is fixed to 2, giving target density f(z) = k +a·exp(b·z)+c·exp(d·z)

Background Mixtures of Truncated Exponentials

Mixtures of Truncated Exponentials

−3 −2 −1 0 1 2 30

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

−6 −4 −2 0 2 4 60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Z

Y

Calculate P (Y = 1) with MTEs: P (Y = 1) ≈ 0.4996851

f(z) =

−0.0172 + 0.931e1.27z if − 3 ≤ z < −1

0.442 − 0.0385e−1.64z if − 1 ≤ z < 0

0.442 − 0.0385e1.64z if 0 ≤ z < 1

−0.0172 + 0.9314e−1.27z if 1 ≤ z < 3

P (Y = 1|z) =

0 if z < −5

−0.0217 + 0.522e0.635z if − 5 ≤ z < 0

1.0217 − 0.522e−0.635z if 0 ≤ z ≤ 5

1 if z > 5

3 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

Page 5: Parameter Estimation in Mixtures of Truncated Exponentials · The number of exponential terms in the mixture for each interval is fixed to 2, giving target density f(z) = k +a·exp(b·z)+c·exp(d·z)

Background Mixtures of Truncated Exponentials

The MTE model

Definition (Univariate MTE potential over a continuous variable)

Let Z be a continuous variable. A function f : ΩZ 7→ R+0 is an

MTE potential over Z

1 If

f(z) = a0 +

m∑

i=1

ai exp (bi · z)

for all z ∈ ΩZ , where ai, bi are real numbers2 . . . or there is a partition of ΩZ into intervals I1, . . . ,Ik s.t.

f is defined as above on each Ij .

Generalization to arbitrary hybrid domains (Moral et al. 2001)

The definition transfers to multivariate domains containing bothcontinuous and discrete variables.

4 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

Page 6: Parameter Estimation in Mixtures of Truncated Exponentials · The number of exponential terms in the mixture for each interval is fixed to 2, giving target density f(z) = k +a·exp(b·z)+c·exp(d·z)

Learning MTEs from data

Outline

1 BackgroundMotivationMixtures of Truncated Exponentials

2 Learning MTEs from dataBackgroundMaximum likelihood estimation in MTEsConstrained optimisation and Lagrange multipliersThe Newton-Raphson methodThe initialisation procedure

3 Model selectionLocating splitpointsDetermining model complexity

4 Conclusions

5 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

Page 7: Parameter Estimation in Mixtures of Truncated Exponentials · The number of exponential terms in the mixture for each interval is fixed to 2, giving target density f(z) = k +a·exp(b·z)+c·exp(d·z)

Learning MTEs from data Background

Learning MTEs from data

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0

5

10

15

The MTE learning problem

How to find the MTE-distribution that generated this data?

6 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

Page 8: Parameter Estimation in Mixtures of Truncated Exponentials · The number of exponential terms in the mixture for each interval is fixed to 2, giving target density f(z) = k +a·exp(b·z)+c·exp(d·z)

Learning MTEs from data Background

Learning MTEs from data

The learning task involves three basic steps:1 Determine the intervals into which ΩZ will be partitioned.2 Determine the number of exponential terms in the

mixture for each interval.3 Estimate the parameters .

Simplifying assumptions

In this work we are concerned with the univariate case.For simplicity we will initially assume that:

The intervals into which ΩZ will be partitioned is known;The number of exponential terms in the mixture for eachinterval is fixed to 2, giving target density

f(z) = k + a · exp(b · z) + c · exp(d · z).

7 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

Page 9: Parameter Estimation in Mixtures of Truncated Exponentials · The number of exponential terms in the mixture for each interval is fixed to 2, giving target density f(z) = k +a·exp(b·z)+c·exp(d·z)

Learning MTEs from data Background

Learning MTEs from data

The learning task involves three basic steps:1 Determine the intervals into which ΩZ will be partitioned.2 Determine the number of exponential terms in the

mixture for each interval.3 Estimate the parameters .

Simplifying assumptions

In this work we are concerned with the univariate case.For simplicity we will initially assume that:

The intervals into which ΩZ will be partitioned is known;The number of exponential terms in the mixture for eachinterval is fixed to 2, giving target density

f(z) = k + a · exp(b · z) + c · exp(d · z).

7 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

Page 10: Parameter Estimation in Mixtures of Truncated Exponentials · The number of exponential terms in the mixture for each interval is fixed to 2, giving target density f(z) = k +a·exp(b·z)+c·exp(d·z)

Learning MTEs from data Maximum likelihood estimation in MTEs

Learning MTEs from data by Maximum Likelihood

Why learn MTEs using Maximum Likelihood?

Well developed core theory, incl. good asymptoticproperties under regularity conditions.ML parameters give access to a variety of modelestimation procedures:

LRT or BIC for selecting no. exponential terms;Likelihood maximisation to locate split-points.

ProblemsThe likelihood equations cannot be solved analytically.

Identifiability or parameters.

8 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

Page 11: Parameter Estimation in Mixtures of Truncated Exponentials · The number of exponential terms in the mixture for each interval is fixed to 2, giving target density f(z) = k +a·exp(b·z)+c·exp(d·z)

Learning MTEs from data Maximum likelihood estimation in MTEs

Initial observations

We will assume target density

f(z|θj) = kj + aj · exp(bj · z) + cj · exp(dj · z), z ∈ Ij

for interval Ij; θj = kj , aj , bj , cj , dj.

Denote by nj the no. observations from interval Ij and letN =

j nj. Then the ML solution θj must satisfy

z ∈ Ij

f(z|θj) dz = nj/N. (1)

Parameter independence

θk can be found independently of θl as long as Equation (1) issatisfied for all θj. We will therefore look at a single interval Ifrom now on (and drop the index j when appropriate).

9 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

Page 12: Parameter Estimation in Mixtures of Truncated Exponentials · The number of exponential terms in the mixture for each interval is fixed to 2, giving target density f(z) = k +a·exp(b·z)+c·exp(d·z)

Learning MTEs from data Constrained optimisation and Lagrange multipliers

Constrained optimisation

Maximize log L(θ|z) =∑

i : zi ∈ I

log L(θ|zi) =∑

i : zi ∈ I

log f(zi|θ)

Subject to∫

z ∈ If(z|θ) dz − n/N = 0,

f(e1|θ) ≥ 0,

f(e2|θ) ≥ 0.

10 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

Page 13: Parameter Estimation in Mixtures of Truncated Exponentials · The number of exponential terms in the mixture for each interval is fixed to 2, giving target density f(z) = k +a·exp(b·z)+c·exp(d·z)

Learning MTEs from data Constrained optimisation and Lagrange multipliers

Constrained optimisation

Maximize log L(θ|z) =∑

i : zi ∈ I

log L(θ|zi) =∑

i : zi ∈ I

log f(zi|θ)

Subject to∫

z ∈ If(z|θ) dz − n/N = 0,

f(e1|θ)− s21 = 0,

f(e2|θ)− s22 = 0.

10 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

Page 14: Parameter Estimation in Mixtures of Truncated Exponentials · The number of exponential terms in the mixture for each interval is fixed to 2, giving target density f(z) = k +a·exp(b·z)+c·exp(d·z)

Learning MTEs from data Constrained optimisation and Lagrange multipliers

Constrained optimisation

Maximize log L(θ|z) =∑

i : zi ∈ I

log L(θ|zi) =∑

i : zi ∈ I

log f(zi|θ)

Subject to∫

z ∈ If(z|θ) dz − n/N = 0,

f(e1|θ)− s21 = 0,

f(e2|θ)− s22 = 0.

Notation:

φ = [θT sT]T, ψ = [θT sT λT]T = [φTλT]T,

g0(φ) =∫

z ∈ I f(z|θ) dz − n/N ,

g1(φ) = f(e1|θ)− s21; g2(φ) = f(e2|θ)− s2

2.

Lagrange multipliers

Find the root of ∇ψ (log L(θ|z) + λTg(φ)) to solve theconstrained optimisation problem.

10 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

Page 15: Parameter Estimation in Mixtures of Truncated Exponentials · The number of exponential terms in the mixture for each interval is fixed to 2, giving target density f(z) = k +a·exp(b·z)+c·exp(d·z)

Learning MTEs from data The Newton-Raphson method

The Newton-Raphson method

Example: Find x s.t. h(x) = 0.

11 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

Page 16: Parameter Estimation in Mixtures of Truncated Exponentials · The number of exponential terms in the mixture for each interval is fixed to 2, giving target density f(z) = k +a·exp(b·z)+c·exp(d·z)

Learning MTEs from data The Newton-Raphson method

The Newton-Raphson method

Example: Find x s.t. h(x) = 0.Initial “guess”: x = x0; approximate h(x) by its tangent in x0.

11 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

Page 17: Parameter Estimation in Mixtures of Truncated Exponentials · The number of exponential terms in the mixture for each interval is fixed to 2, giving target density f(z) = k +a·exp(b·z)+c·exp(d·z)

Learning MTEs from data The Newton-Raphson method

The Newton-Raphson method

Example: Find x s.t. h(x) = 0.New “guess” x1: The point where tangent crosses abscissa.

11 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

Page 18: Parameter Estimation in Mixtures of Truncated Exponentials · The number of exponential terms in the mixture for each interval is fixed to 2, giving target density f(z) = k +a·exp(b·z)+c·exp(d·z)

Learning MTEs from data The Newton-Raphson method

The Newton-Raphson method

Example: Find x s.t. h(x) = 0.Iterate using general formula xt+1 ← xt − h

′(xt)−1 · h(xt).

11 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

Page 19: Parameter Estimation in Mixtures of Truncated Exponentials · The number of exponential terms in the mixture for each interval is fixed to 2, giving target density f(z) = k +a·exp(b·z)+c·exp(d·z)

Learning MTEs from data The Newton-Raphson method

The Lagrange Multipliers method

Maximise likelihood given constraints

Use the multivariate Newton-Raphson method to solveA(ψ |z) ≡ ∇ψ (log L(θ|z) + λTg(φ)) = 0:

ψt+1 ← ψt − J(A(ψt|z))−1 ·A(ψt|z).

Initialisation of Newton-Raphson:

Choose θ0 “randomly” giving s0 =[

f(e1|θ0)√

f(e2|θ0)]T

λ0 = [1 1]T (chosen rather arbitrarily).

12 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

Page 20: Parameter Estimation in Mixtures of Truncated Exponentials · The number of exponential terms in the mixture for each interval is fixed to 2, giving target density f(z) = k +a·exp(b·z)+c·exp(d·z)

Learning MTEs from data The Newton-Raphson method

Example-run, Lagrange multipliers

−6 −4 −2 0 2 4 6−6

−4

−2

0

2

4

6

Likelihood of example data D = z1, . . . , zn; value of point(b0, d0) given as maxk,a,c

i (k + a exp(b0 · zi) + c exp(d0 · zi) .

13 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

Page 21: Parameter Estimation in Mixtures of Truncated Exponentials · The number of exponential terms in the mixture for each interval is fixed to 2, giving target density f(z) = k +a·exp(b·z)+c·exp(d·z)

Learning MTEs from data The initialisation procedure

Initialisation of the Newton-Raphson method

Initialization procedure – Main idea

Instead of maximising over 5 parameters under theconstraint

z ∈ If(z|θ) dz = n/N,

we iteratively maximise over pairs of parameters.

One parameter is varied freely, the other is chosen to makesure that the constraint is fulfilled .

A high-dimensional constrained optimisation problem isthus replaced by a series of “unconstrained”optimisation problems ; each in one dimension .

14 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

Page 22: Parameter Estimation in Mixtures of Truncated Exponentials · The number of exponential terms in the mixture for each interval is fixed to 2, giving target density f(z) = k +a·exp(b·z)+c·exp(d·z)

Learning MTEs from data The initialisation procedure

Initialisation algorithm

Initialisation:Choose some “random” starting values for θ, making sure thatsure that

z ∈ I f(z|θ) dz = n/N .

Constant k a · exp(b · z) c · exp(d · z)

15 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

Page 23: Parameter Estimation in Mixtures of Truncated Exponentials · The number of exponential terms in the mixture for each interval is fixed to 2, giving target density f(z) = k +a·exp(b·z)+c·exp(d·z)

Learning MTEs from data The initialisation procedure

Initialisation algorithm

Maximise over a; compensate using k:k determined by a to make sure that

z ∈ I f(z|θ) dz = n/N .

a← maxa′

i : zi ∈ I

log f(zi|k′ = func(θ, a′), a′,θ).

15 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

Page 24: Parameter Estimation in Mixtures of Truncated Exponentials · The number of exponential terms in the mixture for each interval is fixed to 2, giving target density f(z) = k +a·exp(b·z)+c·exp(d·z)

Learning MTEs from data The initialisation procedure

Initialisation algorithm

Maximise over c; compensate using k:k determined by c to make sure that

z ∈ I f(z|θ) dz = n/N .

c← maxc′

i : zi ∈ I

log f(zi|k′ = func(θ, c′), c′,θ).

15 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

Page 25: Parameter Estimation in Mixtures of Truncated Exponentials · The number of exponential terms in the mixture for each interval is fixed to 2, giving target density f(z) = k +a·exp(b·z)+c·exp(d·z)

Learning MTEs from data The initialisation procedure

Initialisation algorithm

Maximise over b; compensate using a:a determined by b to make sure that

z ∈ I f(z|θ) dz = n/N .

b← maxb′

i : zi ∈ I

log f(zi|a′ = func(θ, b′), b′,θ).

15 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

Page 26: Parameter Estimation in Mixtures of Truncated Exponentials · The number of exponential terms in the mixture for each interval is fixed to 2, giving target density f(z) = k +a·exp(b·z)+c·exp(d·z)

Learning MTEs from data The initialisation procedure

Initialisation algorithm

Maximise over d; compensate using c:c determined by d to make sure that

z ∈ I f(z|θ) dz = n/N .

d← maxd′

i : zi ∈ I

log f(zi|c′ = func(θ, d′), d′,θ).

15 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

Page 27: Parameter Estimation in Mixtures of Truncated Exponentials · The number of exponential terms in the mixture for each interval is fixed to 2, giving target density f(z) = k +a·exp(b·z)+c·exp(d·z)

Learning MTEs from data The initialisation procedure

Initialisation algorithm

Check for convergence:

At this point all parameters have been updated at leastonce.

Calculate likelihood and check if there is a significantimprovement.

If improved then iterate again , otherwise return .

15 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

Page 28: Parameter Estimation in Mixtures of Truncated Exponentials · The number of exponential terms in the mixture for each interval is fixed to 2, giving target density f(z) = k +a·exp(b·z)+c·exp(d·z)

Learning MTEs from data The initialisation procedure

Example run (with initialisation)

−6 −4 −2 0 2 4 6−6

−4

−2

0

2

4

6

16 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

Page 29: Parameter Estimation in Mixtures of Truncated Exponentials · The number of exponential terms in the mixture for each interval is fixed to 2, giving target density f(z) = k +a·exp(b·z)+c·exp(d·z)

Model selection

Outline

1 BackgroundMotivationMixtures of Truncated Exponentials

2 Learning MTEs from dataBackgroundMaximum likelihood estimation in MTEsConstrained optimisation and Lagrange multipliersThe Newton-Raphson methodThe initialisation procedure

3 Model selectionLocating splitpointsDetermining model complexity

4 Conclusions

17 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

Page 30: Parameter Estimation in Mixtures of Truncated Exponentials · The number of exponential terms in the mixture for each interval is fixed to 2, giving target density f(z) = k +a·exp(b·z)+c·exp(d·z)

Model selection Locating splitpoints

Model selection: Split-point for dataset

New data-set: 50 samples from the standard Normal distribution

−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2−155

−150

−145

−140

−135

−130

−125

−120

Likelihood of data using ML estimators for different split-points

18 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

Page 31: Parameter Estimation in Mixtures of Truncated Exponentials · The number of exponential terms in the mixture for each interval is fixed to 2, giving target density f(z) = k +a·exp(b·z)+c·exp(d·z)

Model selection Determining model complexity

Model selection: No. parameters per interval

−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Interval [−2.5200,−0.1303〉:Constant term: L(θ1|z) = −77.641.

1 exponential term: L(θ1|z) = −55.317 =⇒ p = 0.000.

2 exponential terms: L(θ1|z) = −55.314 =⇒ p = 0.996.

19 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

Page 32: Parameter Estimation in Mixtures of Truncated Exponentials · The number of exponential terms in the mixture for each interval is fixed to 2, giving target density f(z) = k +a·exp(b·z)+c·exp(d·z)

Model selection Determining model complexity

Model selection: No. parameters per interval

−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Interval [−0.1303, 2.2368〉:Constant term: L(θ2|z) = −77.742.

1 exponential term: L(θ2|z) = −64.490 =⇒ p = 0.000.

2 exponential terms: L(θ2|z) = −64.490 =⇒ p = 1.000.

19 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

Page 33: Parameter Estimation in Mixtures of Truncated Exponentials · The number of exponential terms in the mixture for each interval is fixed to 2, giving target density f(z) = k +a·exp(b·z)+c·exp(d·z)

Conclusions

Outline

1 BackgroundMotivationMixtures of Truncated Exponentials

2 Learning MTEs from dataBackgroundMaximum likelihood estimation in MTEsConstrained optimisation and Lagrange multipliersThe Newton-Raphson methodThe initialisation procedure

3 Model selectionLocating splitpointsDetermining model complexity

4 Conclusions

20 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

Page 34: Parameter Estimation in Mixtures of Truncated Exponentials · The number of exponential terms in the mixture for each interval is fixed to 2, giving target density f(z) = k +a·exp(b·z)+c·exp(d·z)

Conclusions

Conclusions

We have described an efficient method for learning MLestimates of univariate MTEs.

ML estimates fairly robust ; improvement over traditional(regression-based) method substantial.ML estimates can be used for model selection :

No. exponential terms in each interval;Number of split-points, and their location.

Ongoing work: Extension to conditional distributions:

Learning parameters of conditional distributions (“solved”).

Locating split-points (difficult; some progress is made).

21 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs