Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Parameter Estimation inMixtures of Truncated Exponentials
Helge Langseth1 Thomas D. Nielsen2 Rafael Rumí3
Antonio Salmerón3
1Dept. of Computer and Information Science, The Norwegian University ofScience and Technology, Norway
2Dept. of Computer Science, Aalborg University, Denmark
3Dept. of Statistics and Applied Mathematics, University of Almería, Spain
PGM, September 2008
1 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Outline
1 BackgroundMotivationMixtures of Truncated Exponentials
2 Learning MTEs from dataBackgroundMaximum likelihood estimation in MTEsConstrained optimisation and Lagrange multipliersThe Newton-Raphson methodThe initialisation procedure
3 Model selectionLocating splitpointsDetermining model complexity
4 Conclusions
2 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Background Mixtures of Truncated Exponentials
Mixtures of Truncated Exponentials
−3 −2 −1 0 1 2 30
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
−6 −4 −2 0 2 4 60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Z
Y
Calculate P (Y = 1) in Hugin: “Illegal link”
f(z) = 1√2π
exp(
−12z2
)
P (Y = 1|z) = 11+exp(−z)
3 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Background Mixtures of Truncated Exponentials
Mixtures of Truncated Exponentials
−3 −2 −1 0 1 2 30
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
−6 −4 −2 0 2 4 60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Z
Y
Calculate P (Y = 1) with MTEs: P (Y = 1) ≈ 0.4996851
f(z) =
−0.0172 + 0.931e1.27z if − 3 ≤ z < −1
0.442 − 0.0385e−1.64z if − 1 ≤ z < 0
0.442 − 0.0385e1.64z if 0 ≤ z < 1
−0.0172 + 0.9314e−1.27z if 1 ≤ z < 3
P (Y = 1|z) =
0 if z < −5
−0.0217 + 0.522e0.635z if − 5 ≤ z < 0
1.0217 − 0.522e−0.635z if 0 ≤ z ≤ 5
1 if z > 5
3 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Background Mixtures of Truncated Exponentials
The MTE model
Definition (Univariate MTE potential over a continuous variable)
Let Z be a continuous variable. A function f : ΩZ 7→ R+0 is an
MTE potential over Z
1 If
f(z) = a0 +
m∑
i=1
ai exp (bi · z)
for all z ∈ ΩZ , where ai, bi are real numbers2 . . . or there is a partition of ΩZ into intervals I1, . . . ,Ik s.t.
f is defined as above on each Ij .
Generalization to arbitrary hybrid domains (Moral et al. 2001)
The definition transfers to multivariate domains containing bothcontinuous and discrete variables.
4 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Learning MTEs from data
Outline
1 BackgroundMotivationMixtures of Truncated Exponentials
2 Learning MTEs from dataBackgroundMaximum likelihood estimation in MTEsConstrained optimisation and Lagrange multipliersThe Newton-Raphson methodThe initialisation procedure
3 Model selectionLocating splitpointsDetermining model complexity
4 Conclusions
5 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Learning MTEs from data Background
Learning MTEs from data
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
5
10
15
The MTE learning problem
How to find the MTE-distribution that generated this data?
6 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Learning MTEs from data Background
Learning MTEs from data
The learning task involves three basic steps:1 Determine the intervals into which ΩZ will be partitioned.2 Determine the number of exponential terms in the
mixture for each interval.3 Estimate the parameters .
Simplifying assumptions
In this work we are concerned with the univariate case.For simplicity we will initially assume that:
The intervals into which ΩZ will be partitioned is known;The number of exponential terms in the mixture for eachinterval is fixed to 2, giving target density
f(z) = k + a · exp(b · z) + c · exp(d · z).
7 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Learning MTEs from data Background
Learning MTEs from data
The learning task involves three basic steps:1 Determine the intervals into which ΩZ will be partitioned.2 Determine the number of exponential terms in the
mixture for each interval.3 Estimate the parameters .
Simplifying assumptions
In this work we are concerned with the univariate case.For simplicity we will initially assume that:
The intervals into which ΩZ will be partitioned is known;The number of exponential terms in the mixture for eachinterval is fixed to 2, giving target density
f(z) = k + a · exp(b · z) + c · exp(d · z).
7 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Learning MTEs from data Maximum likelihood estimation in MTEs
Learning MTEs from data by Maximum Likelihood
Why learn MTEs using Maximum Likelihood?
Well developed core theory, incl. good asymptoticproperties under regularity conditions.ML parameters give access to a variety of modelestimation procedures:
LRT or BIC for selecting no. exponential terms;Likelihood maximisation to locate split-points.
ProblemsThe likelihood equations cannot be solved analytically.
Identifiability or parameters.
8 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Learning MTEs from data Maximum likelihood estimation in MTEs
Initial observations
We will assume target density
f(z|θj) = kj + aj · exp(bj · z) + cj · exp(dj · z), z ∈ Ij
for interval Ij; θj = kj , aj , bj , cj , dj.
Denote by nj the no. observations from interval Ij and letN =
∑
j nj. Then the ML solution θj must satisfy
∫
z ∈ Ij
f(z|θj) dz = nj/N. (1)
Parameter independence
θk can be found independently of θl as long as Equation (1) issatisfied for all θj. We will therefore look at a single interval Ifrom now on (and drop the index j when appropriate).
9 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Learning MTEs from data Constrained optimisation and Lagrange multipliers
Constrained optimisation
Maximize log L(θ|z) =∑
i : zi ∈ I
log L(θ|zi) =∑
i : zi ∈ I
log f(zi|θ)
Subject to∫
z ∈ If(z|θ) dz − n/N = 0,
f(e1|θ) ≥ 0,
f(e2|θ) ≥ 0.
10 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Learning MTEs from data Constrained optimisation and Lagrange multipliers
Constrained optimisation
Maximize log L(θ|z) =∑
i : zi ∈ I
log L(θ|zi) =∑
i : zi ∈ I
log f(zi|θ)
Subject to∫
z ∈ If(z|θ) dz − n/N = 0,
f(e1|θ)− s21 = 0,
f(e2|θ)− s22 = 0.
10 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Learning MTEs from data Constrained optimisation and Lagrange multipliers
Constrained optimisation
Maximize log L(θ|z) =∑
i : zi ∈ I
log L(θ|zi) =∑
i : zi ∈ I
log f(zi|θ)
Subject to∫
z ∈ If(z|θ) dz − n/N = 0,
f(e1|θ)− s21 = 0,
f(e2|θ)− s22 = 0.
Notation:
φ = [θT sT]T, ψ = [θT sT λT]T = [φTλT]T,
g0(φ) =∫
z ∈ I f(z|θ) dz − n/N ,
g1(φ) = f(e1|θ)− s21; g2(φ) = f(e2|θ)− s2
2.
Lagrange multipliers
Find the root of ∇ψ (log L(θ|z) + λTg(φ)) to solve theconstrained optimisation problem.
10 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Learning MTEs from data The Newton-Raphson method
The Newton-Raphson method
Example: Find x s.t. h(x) = 0.
11 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Learning MTEs from data The Newton-Raphson method
The Newton-Raphson method
Example: Find x s.t. h(x) = 0.Initial “guess”: x = x0; approximate h(x) by its tangent in x0.
11 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Learning MTEs from data The Newton-Raphson method
The Newton-Raphson method
Example: Find x s.t. h(x) = 0.New “guess” x1: The point where tangent crosses abscissa.
11 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Learning MTEs from data The Newton-Raphson method
The Newton-Raphson method
Example: Find x s.t. h(x) = 0.Iterate using general formula xt+1 ← xt − h
′(xt)−1 · h(xt).
11 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Learning MTEs from data The Newton-Raphson method
The Lagrange Multipliers method
Maximise likelihood given constraints
Use the multivariate Newton-Raphson method to solveA(ψ |z) ≡ ∇ψ (log L(θ|z) + λTg(φ)) = 0:
ψt+1 ← ψt − J(A(ψt|z))−1 ·A(ψt|z).
Initialisation of Newton-Raphson:
Choose θ0 “randomly” giving s0 =[
√
f(e1|θ0)√
f(e2|θ0)]T
λ0 = [1 1]T (chosen rather arbitrarily).
12 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Learning MTEs from data The Newton-Raphson method
Example-run, Lagrange multipliers
−6 −4 −2 0 2 4 6−6
−4
−2
0
2
4
6
Likelihood of example data D = z1, . . . , zn; value of point(b0, d0) given as maxk,a,c
∏
i (k + a exp(b0 · zi) + c exp(d0 · zi) .
13 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Learning MTEs from data The initialisation procedure
Initialisation of the Newton-Raphson method
Initialization procedure – Main idea
Instead of maximising over 5 parameters under theconstraint
∫
z ∈ If(z|θ) dz = n/N,
we iteratively maximise over pairs of parameters.
One parameter is varied freely, the other is chosen to makesure that the constraint is fulfilled .
A high-dimensional constrained optimisation problem isthus replaced by a series of “unconstrained”optimisation problems ; each in one dimension .
14 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Learning MTEs from data The initialisation procedure
Initialisation algorithm
Initialisation:Choose some “random” starting values for θ, making sure thatsure that
∫
z ∈ I f(z|θ) dz = n/N .
Constant k a · exp(b · z) c · exp(d · z)
15 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Learning MTEs from data The initialisation procedure
Initialisation algorithm
Maximise over a; compensate using k:k determined by a to make sure that
∫
z ∈ I f(z|θ) dz = n/N .
a← maxa′
∑
i : zi ∈ I
log f(zi|k′ = func(θ, a′), a′,θ).
15 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Learning MTEs from data The initialisation procedure
Initialisation algorithm
Maximise over c; compensate using k:k determined by c to make sure that
∫
z ∈ I f(z|θ) dz = n/N .
c← maxc′
∑
i : zi ∈ I
log f(zi|k′ = func(θ, c′), c′,θ).
15 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Learning MTEs from data The initialisation procedure
Initialisation algorithm
Maximise over b; compensate using a:a determined by b to make sure that
∫
z ∈ I f(z|θ) dz = n/N .
b← maxb′
∑
i : zi ∈ I
log f(zi|a′ = func(θ, b′), b′,θ).
15 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Learning MTEs from data The initialisation procedure
Initialisation algorithm
Maximise over d; compensate using c:c determined by d to make sure that
∫
z ∈ I f(z|θ) dz = n/N .
d← maxd′
∑
i : zi ∈ I
log f(zi|c′ = func(θ, d′), d′,θ).
15 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Learning MTEs from data The initialisation procedure
Initialisation algorithm
Check for convergence:
At this point all parameters have been updated at leastonce.
Calculate likelihood and check if there is a significantimprovement.
If improved then iterate again , otherwise return .
15 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Learning MTEs from data The initialisation procedure
Example run (with initialisation)
−6 −4 −2 0 2 4 6−6
−4
−2
0
2
4
6
16 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Model selection
Outline
1 BackgroundMotivationMixtures of Truncated Exponentials
2 Learning MTEs from dataBackgroundMaximum likelihood estimation in MTEsConstrained optimisation and Lagrange multipliersThe Newton-Raphson methodThe initialisation procedure
3 Model selectionLocating splitpointsDetermining model complexity
4 Conclusions
17 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Model selection Locating splitpoints
Model selection: Split-point for dataset
New data-set: 50 samples from the standard Normal distribution
−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2−155
−150
−145
−140
−135
−130
−125
−120
Likelihood of data using ML estimators for different split-points
18 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Model selection Determining model complexity
Model selection: No. parameters per interval
−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 20
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Interval [−2.5200,−0.1303〉:Constant term: L(θ1|z) = −77.641.
1 exponential term: L(θ1|z) = −55.317 =⇒ p = 0.000.
2 exponential terms: L(θ1|z) = −55.314 =⇒ p = 0.996.
19 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Model selection Determining model complexity
Model selection: No. parameters per interval
−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 20
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Interval [−0.1303, 2.2368〉:Constant term: L(θ2|z) = −77.742.
1 exponential term: L(θ2|z) = −64.490 =⇒ p = 0.000.
2 exponential terms: L(θ2|z) = −64.490 =⇒ p = 1.000.
19 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Conclusions
Outline
1 BackgroundMotivationMixtures of Truncated Exponentials
2 Learning MTEs from dataBackgroundMaximum likelihood estimation in MTEsConstrained optimisation and Lagrange multipliersThe Newton-Raphson methodThe initialisation procedure
3 Model selectionLocating splitpointsDetermining model complexity
4 Conclusions
20 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Conclusions
Conclusions
We have described an efficient method for learning MLestimates of univariate MTEs.
ML estimates fairly robust ; improvement over traditional(regression-based) method substantial.ML estimates can be used for model selection :
No. exponential terms in each interval;Number of split-points, and their location.
Ongoing work: Extension to conditional distributions:
Learning parameters of conditional distributions (“solved”).
Locating split-points (difficult; some progress is made).
21 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs