Upload
joe-suzuki
View
86
Download
0
Tags:
Embed Size (px)
DESCRIPTION
AIGM 2014
Citation preview
.
......
The Chow-Liu algorithm based on the MDL with discreeteand continuous variables
Joe Suzuki
Osaka University
AIGM 2014, Paris
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 1 / 26
The Chow-Liu Algorithm
Chow-Liu
P1,··· ,N : Probability of X (1), · · · ,X (N) N (≥ 1)G = (V ,E ): Undirected GraphE := {}, V := {1, · · · ,N} (N ≥ 1), E := {{i , j}|i = j , i , j ∈ V }do E = {}
...1 choose {i , j} ∈ E that maximizes I (i , j)
...2 remove {i , j} from E
...3 if no loop is generated, add {i , j} to E
Mutual Information of X (i),X (j):
I (i , j) :=∑x(i)
∑x(j)
Pi ,j(x(i), x (j)) log
Pi ,j(x(i), x (j))
Pi (x (j))Pi (x (i))
.Tree E s.t.
∑{i ,j}∈E I (i , j) → max
..
......D(P1,··· ,N ||Q) → min
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 2 / 26
The Chow-Liu Algorithm
Example
Q(x (1), x (2), x (3), x (4))
=P1,2(x
(1), x (2))P1,3(x(1), x (3))P1,4(x
(1), x (4))
P1(x (1))P2(x (1)) · P1(x (1))P3(x (1)) · P1(x (1))P4(x (4))
·P1(x(1))P2(x
(2))P3(x(3))P4(x
(4))
= P(x (1))P(x (2)|x (1))P(x (3)|x (1))P(x (4)|x (1))
i 1 1 2 1 2 3
j 2 3 3 4 4 4
I (i , j) 12 10 8 6 4 2
j jj j2 4
1 3 j jj j2 4
1 3 j jj j2 4
1 3 j jj j2 4
1 3@@
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 3 / 26
The Chow-Liu Algorithm
Dendroid Distribution
X (1), · · · ,X (N): Discrete Random VariablesV := {1, · · · ,N}E ⊆ {{i , j}|i = j , i , j ∈ V }
Q(x (1), · · · , x (N)|E ) =∏
{i ,j}∈E
Pi ,j(x(i), x (j))
Pi (x (i))Pj(x (j))
∏i∈V
Pi (x(i)) ,
{Pi (x(i))}i∈V , {Pi ,j(x
(i), x (j))}i =j : from P1,··· ,N(x(1), · · · , x (N))
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 4 / 26
The Chow-Liu Algorithm
Contribution
.Starting from Data........Learning rather than Approximation
distribution P1,··· ,N
data xn = {(x (1)i , · · · , x (N)i )}ni=1
.In any database,........some fields are discrete and others continuous
Joe Suzuki: A Construction of Bayesian Networks from DatabasesBased on an MDL Principle, UAI 1993
David Edwords, et. al: Selecting high-dimensional mixed graphicalmodels using minimal AIC or BIC forests, BMC Informatics 2010
Joe Suzuki: Learning Bayesian network structures when discrete andcontinous variables are present, PGM 2014
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 5 / 26
The Chow-Liu Algorithm
Maximum Likelihood (ML)
{Pi (x(i))}i∈V , {Pi ,j(x
(i), x (j))}i =j are obtained from xn
ML Estimation of MI:
I (i , j) :=∑x(i)
∑x(j)
Pi ,j(x(i), x (j)) log
Pi ,j(x(i), x (j))
Pi (x (j))Pi (x (i))
Empirical Entropy given E (minus Likelihood given E ):
Hn(xn|E ) := n∑i∈V
H(i)− n∑
{i ,j}∈E
I (i , j)
.ML seeks a tree even if X (1), · · ·X (N) are independent........The true graph is not obtained even if n → ∞
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 6 / 26
The Chow-Liu Algorithm
Prior Distribution over Forest (V ,E )
pij : the prior probability of X (i) ⊥⊥ X (j)
π(E ) :=1
K
∏{i ,j}∈E
1− pijpij
K :=∑ ∏
{i ,j}∈E
1− pijpij
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 7 / 26
The Chow-Liu Algorithm
Minimum Description Length (Suzuki, UAI-1993)
R(i) =
∫P({x (i)k }nk=1|θ)w(θ)dθ
R(i , j) =
∫P({x (i)k , x
(j)k }nk=1|θ)w(θ)dθ
Rn(xn|E ) :=∏
{i ,j}∈E
R(i , j)
R(i)R(j)
∏i∈V
R(i)
L(xn|E ) := − logR(xn|E )Description Length:
l(xn) = − log π(E ) + L(xn|E ) → min
Bayesian Estimation of MI:
J(i , j) :=1
nlog
R(i , j)
R(i)R(j)
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 8 / 26
The Chow-Liu Algorithm
If we expand using approximaion, we find
k(E ): # of Parameters in Eα(i): # of values X (i) takes
L(xn|E ) ≈ Hn(xn|E ) + 1
2k(E ) log n
l(xn) ≈ Hn(xn|E ) + 1
2k(E ) log n − log π(E )
J(i , j) ≈ I (i , j)− 1
2n(α(i) − 1)(α(j) − 1) log n − 1
nlog
1− pijpij
the orders of choosing edges are different
J(i , j) could be negative and makes a forest while I (i , j) makes a tree
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 9 / 26
The Chow-Liu Algorithm
Univesality
.Universal Measure w.r.t. finte set A..
......
There exists Rn s.t.1
nlog
Pn(xn)
Rn(xn)→ 0
(xn ∈ An) with Pn-Probability one as n → ∞ for any Pn.
P(i) =∏n
k=1 P(x(i)k ) , P(i , j) =
∏nk=1 P(x
(i)k , x
(j)k )
1
nlog
P(i)
R(i)→ 0 ,
1
nlog
P(i , j)
R(i , j)→ 0
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 10 / 26
The Chow-Liu Algorithm
Consistency
Qn(xn|E ) :=∏
{i ,j}∈E
P(i , j)
P(i)P(j)
∏i∈V
P(i)
with Prob. 1 as n → ∞ for any Qn(·|E )
1
nlog
Qn(xn|E )Rn(xn|E )
→ 0
For large n,
π(E1)Q(xn|E1) ≤ π(E2)Q(xn|E2) ⇐⇒ π(E1)R(xn|E1) ≤ π(E2)R(x
n|E2)
A maximum posterior probability forest is obtained for large n.
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 11 / 26
The Chow-Liu Algorithm
ML vs MDL
ML MDL
Choices Minimize Minimize
of E Hn(xn|E ) Hn(xn|E )+1
2k(E ) log n − log π(E )
Choices of {i , j} Maximize I (i , j) Maximize J(i , j)
Criteria Fitness of xn to E Fitness of xn to Eand Simplicity of E
Consistency No Yes
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 12 / 26
When Density Exists
When density f exists for X (Ryabko, 2009)
A0 := {A}Aj+1 is a refinement of Aj
for each j , xn = (x1, · · · , xn) ∈ Rn 7→ (a(j)1 , · · · , a(j)n ) ∈ An
j
......
......
-
-
-
A1
A2
Aj
gn1 (x
n) =Rn1 (a
(1)1 , · · · , a(1)n )
λ(a(1)1 ) · · ·λ(a(1)n )
gn2 (x
n) =Rn2 (a
(2)1 , · · · , a(2)n )
λ(a(2)1 ) · · ·λ(a(2)n )
gnj (x
n) =Rnj (a
(j)1 , · · · , a(j)n )
λ(a(j)1 ) · · ·λ(a(j)n )
λ: Lebesgue measure (width of interval), Rnj : Universal Measure w.r.t. Aj
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 13 / 26
When Density Exists
∑j wj = 1, wj > 0
gn(xn) :=∞∑j=1
wjgnj (x
n)
f : density functionfj (density function of level j)f n(xn) := f (x1) · · · f (xn).Ryabko 2009..
......
for any f s.t. D(f ||fj) → 0 (j → ∞)
1
nlog
f n(xn)
gn(xn)→ 0
as n → ∞
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 14 / 26
When Density does not exists
Extensions from Ryabko 2009
Remove the assumption that a density exists.
Remove the restricion of density class“for any f s.t. D(f ||fj) → 0 (j → ∞)” → “for any f ”
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 15 / 26
When Density does not exists
When density does not exist for X (Suzuki 2011)
B1 := {{1}, {2, 3, · · · }}B2 := {{1}, {2}, {3, 4, · · · }}. . .Bk := {{1}, {2}, · · · , {k}, {k + 1, k + 2, · · · }}. . .
for each level k, xn = (x1, · · · , xn) ∈ Nn 7→ (b(k)1 , · · · , b(k)n ) ∈ Bn
k
η({k}) = 1
k− 1
k + 1
gnk (y
n) :=Rnk (b
(k)1 , · · · , b(k)n )
η(b(k)1 ) · · · η(b(k)n )
∑ωk = 1, ωk > 0, gn(xn) :=
∞∑k=1
ωkgnk (x
n)
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 16 / 26
When Density does not exists
D(f ||fj) −→ 0 as j → ∞ (1)
∫ 1
12
f (x)dx > 0
-0 1 x
C0
C1
C2
C3...
......
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 17 / 26
When Density does not exists
D(f ||fj) −→ 0 as j → ∞ (2)
∫ ∞
1f (x)dx > 0
-0 1 x
C0
C1
C2
C3...
......
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 18 / 26
When Density does not exists
D(f ||fj) −→ 0 as j → ∞
Universal Histogram Sequence {Ck}∞k=0
...... -
xµ σ−σ x
C0
C1
C2
C3
...
.Suzuki 2013..
......
For any (generalized) density f as n → ∞ with Prob. 1
1
nlog
f n(xn)
gn(xn)→ 0
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 19 / 26
When Density does not exists
Computing gn(xn)
Input xn ∈ An, output gn(xn)...1 For each k = 1, · · · ,K , gn
k (xn) := 0
...2 For each k = 1, · · · ,K and each a ∈ Ak , ck(a) := 0
...3 For each i = 1, · · · , n, for each k = 1, · · · ,K...1 Find ai ∈ Ak from xi ∈ A
...2 gnk (x
n) := gnk (x
n)− logck(ai ) + 1/2
i − 1 + |Ak |/2+ log(ηX (ai ))
...3 ck(ai ) := ck(ai ) + 1
...4 gn(xn) := 1K
∑Kk=1 g
nk (x
n)
Universal Measure w.r.t. Ak
Rnk (x
n) =n∏
i=1
c(a(k)i ) + 1/2
i − 1 + |Ak |/2
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 20 / 26
When Density does not exists
Computation: O(nN2K )
.Computing gn(xn) and gn(xn, yn)..
......
O(nN2K )(O(nN2) for discrete case)
Proportional to n and N + N(N − 1)/2
a(1)i 7→ a
(2)i 7→ · · · 7→ a
(K)i : Binary Search
Proprtional to K
gn(xn, yn) can be obtained byK∑
k=1
ωkgnk,k(x
n, yn) rather thanJ∑
j=1
K∑k=1
ωjkgnjk(x
n, yn).
.Computng MI and finding the forest........N(N − 1)/2
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 21 / 26
When Density does not exists
Bayesian Estimator of Mutual Information
J(i , j) =1
nlog
gn(i , j)
gn(i)gn(j)− 1
nlog
1− pi ,jpij
age height menarche sex igf1 tanner testvol weight
age NA 0.7627465 0.8521553 0.01010264 0.5138440 0.52534862 0.1997714 0.6091554
height NA NA 0.6706380 0.26225428 0.4132932 0.68547041 0.3105466 0.9269808
menarche NA NA NA 0.68786102 0.4919746 0.84283639 0.0000000 0.6456718
sex NA NA NA NA 0.2778511 0.08923994 0.1083901 0.1925525
igf1 NA NA NA NA NA 0.47529101 0.2272998 0.3722551
tanner NA NA NA NA NA NA 0.3796768 0.6420483
testvol NA NA NA NA NA NA NA 0.2409487
weight NA NA NA NA NA NA NA NA
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 22 / 26
When Density does not exists
R ISwR package juul2
The juul data frame has 1339 rows and 6 columns. It contains a referencesample of the distribution of insulin-like growth factor (IGF-I), oneobservation per subject in various ages, with the bulk of the data collectedin connection with school physical examinations.
����
����
����
����
����
����
����
����
weight height
sex
age
tanner
igf1
menar-che
testvol
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 23 / 26
When Density does not exists
Experiments
n 100 500 1000 2000
Jn(i , j) 0.90 0.99 1.86 3.15HSIC 0.50 9.51 40.28 185.53
(a) N = 4
n 100 500 1000 2000
perfectly matching rate 0.52 0.60 0.72 0.79K-L divergence loss 0.0169 0.00303 0.00152 0.000405execution time (sec) 1.64 12.71 22.45 51.24
(b) N = 4
n 100 500 1000 2000
perfectly matching rate 0.18 0.31 0.38 0.59K-L divergence loss 0.0652 0.00800 0.00575 0.00298execution time (sec) 4.27 24.44 52.5 116.1
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 24 / 26
When Density does not exists
Experiments
data.frame n N discrete timeContinuous (sec)
airquality 153 6 (d,d,c,d,d,d) 10.47anscombe 51 4 (d,c,c,d) 3.32attenu 182 5 (d,c,d,c,c) 9.64attitude 30 7 (d,d,d,d,d,d,d) 4.26beaver1 114 4 (d,d,c,d) 2.54beaver2 100 4 (d,d,c,d) 2.73BOD 6 2 (d,c) 0.11cars 50 2 (d,d) 0.80ChickWeight 578 4 (d,d,d,d) 13.01chickwts 71 2 (d,d) 0.98CO2 84 5 (d,d,d,d,c) 3.33DNase 176 3 (d,c,c) 2.36esoph 88 5 (d,d,d,d,d) 2.12faithful 272 2 (c,d) 1.52Formaldehyde 6 2 (c.c) 0.18freeny 39 5 (c,c,c,c,c) 2.57Indometh 66 3 (d,c,c) 0.97Infert 248 8 (d,d,d,d,d,d,d,
d) 13.91InsecSprays 72 2 (d,d) 0.23iris 150 5 (c,c,c,c,d) 6.94LifeCycleSavings 50 5 (c,c,c,c,c) 3.1Lobllolly 84 3 (c,d,d) 1.01longley 16 7 (c,c,c,c,c,d,c) 2.26morley 100 3 (d,d,d) 1.21mtcars 32 11 (c,c,c,c,c,c,c,
c,c,c,c) 6.73Orange 35 3 (d,d,d) 0.5OrchadSprays 64 4 (d,d,d,d) 1.09PlantGrowth 30 2 (c,d) 0.16pressure 19 2 (d,c) 0.22Puromycin 23 3 (c,d,d) 0.34quakes 1000 5 (c,c,c,c,d) 56.12sleep 20 3 (c,c,d) 0.48stackloss 21 4 (d,d,d,d) 0.53swiss 47 6 (c,c,d,d,c,c) 4.18Theoph 132 5 (d,c,c,c,c) 6.94ToothGrowth 60 4 (d,c,d,c) 1.11trees 31 3 (c,d,c) 0.58USArrests 50 4 (c,d,d,c) 1.87USJudgeRatings 43 12 (c,c,c,c,c,c,c,
c,c,c,c,c) 13.66warpbreaks 54 3 (d,d,d) 0.27women 15 2 (d,d) 0.9
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 25 / 26
Conclusion
Conclusion
.Establish Chow-Liu Learning based on MDL without assuming eitherDiscrete or Continuous..
......
Theoretical Analysis w.r.t. n,N,K (K : quantization depth)
Realistic Computation using R
Insight:
The implimation is not hard
The computation is proportional to K
Future Works:
Optimal K w.r.t. n,N
Exponential Memory w.r.t. K
R Package Publication
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete and continuous variablesAIGM 2014, Paris 26 / 26