59
Nov. 20–23, 2019 IBIS 2019 隣接代数と双対平坦構造を 用いた学習 杉山 麿人(国立情報学研究所,JST さきがけ研究者)

隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Nov. 20–23, 2019IBIS 2019

隣接代数と双対平坦構造を用いた学習

杉山 麿人(国立情報学研究所,JSTさきがけ研究者)

Page 2: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Matrix Balancingp₁₁ p₁₂

p₂₁ p₂₂

1/41

Page 3: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Matrix Balancing

r₁s₁ p₁₁ r₁s₂ p₁₂

r₂s₁ p₂₁ r₂s₂ p₂₂

p₁₁ p₁₂

p₂₁ p₂₂

s₁ 0

0 s₂

r₁ 0

0 r₂

∑j r₁sj p₁j = 1

∑j r₂sj p₂j = 1

∑i ris₁ pi₁ = 1 ∑i ris₂ pi₂ = 1

=

Find r and s:(Make doublystochasticmatrix)

1/41

Page 4: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Sinkhorn-Knopp Algorithm• Alternately rescale all rows and columnsof a matrix 𝑃 to sum to 1

• Commonly used to compute entropy-regularizedOptimal transport (Wasserstein distance)

2/41

Page 5: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Revisit Matrix Balancing [ICML2017]

p₁₁ p₁₂ p₁₃

p₂₁ p₂₂ p₂₃

p₃₁ p₃₂ p₃₃

3/41

Page 6: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Introduce 𝜼

p₁₁ p₁₂ p₁₃

p₂₁ p₂₂ p₂₃

p₃₁ p₃₂ p₃₃

η₁₁

η₂₁

η₃₁

η₁₂ η₁₃

4/41

Page 7: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Introduce 𝜼

p₁₁ p₁₂ p₁₃

p₂₁ p₂₂ p₂₃

p₃₁ p₃₂ p₃₃

η₁₁

η₂₁

η₃₁

η₁₂ η₁₃

4/41

Page 8: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Introduce 𝜼

p₁₁ p₁₂ p₁₃

p₂₁ p₂₂ p₂₃

p₃₁ p₃₂ p₃₃

η₁₁

η₂₁

η₃₁

η₁₂ η₁₃

4/41

Page 9: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Introduce 𝜼

p₁₁ p₁₂ p₁₃

p₂₁ p₂₂ p₂₃

p₃₁ p₃₂ p₃₃

η₁₁

η₂₁

η₃₁

η₁₂ η₁₃

4/41

Page 10: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Introduce 𝜼

3

2

1

2 1p₁₁ p₁₂ p₁₃

p₂₁ p₂₂ p₂₃

p₃₁ p₃₂ p₃₃

η₁₁

η₂₁

η₃₁

η₁₂ η₁₃

4/41

Page 11: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Introduce 𝜽

p₁₁ p₁₂ p₁₃

p₂₁ p₂₂ p₂₃

p₃₁ p₃₂ p₃₃

θ₂₂ θ₂₃

θ₃₂ θ₃₃

θij = log pijlog pi–₁j – log pij–₁log pi–₁j–₁

–+

5/41

Page 12: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Introduce 𝜽

p₁₁ p₁₂ p₁₃

p₂₁ p₂₂ p₂₃

p₃₁ p₃₂ p₃₃

θ₂₂ θ₂₃

θ₃₂ θ₃₃

θij = log pijlog pi–₁j – log pij–₁log pi–₁j–₁

–+

5/41

Page 13: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Introduce 𝜽

p₁₁ p₁₂ p₁₃

p₂₁ p₂₂ p₂₃

p₃₁ p₃₂ p₃₃

θ₂₂ θ₂₃

θ₃₂ θ₃₃

θij = log pijlog pi–₁j – log pij–₁log pi–₁j–₁

–+

5/41

Page 14: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Introduce 𝜽

p₁₁ p₁₂ p₁₃

p₂₁ p₂₂ p₂₃

p₃₁ p₃₂ p₃₃

θ₂₂ θ₂₃

θ₃₂ θ₃₃

θij = log pijlog pi–₁j – log pij–₁log pi–₁j–₁

–+

5/41

Page 15: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Balancing as Constraints on 𝜼 and 𝜽

3

2

1

2 1p₁₁ p₁₂ p₁₃

p₂₁ p₂₂ p₂₃

p₃₁ p₃₂ p₃₃

η₁₁

η₂₁

η₃₁

η₁₂ η₁₃

θ₂₂ θ₂₃

θ₃₂ θ₃₃

θij = log pijlog pi–₁j – log pij–₁log pi–₁j–₁

–+

Matrix balancing ⇔Satisfy ηi₁ = η₁i = 3 – i + 1with keeping all θij

6/41

Page 16: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Natural Gradient• Given 𝑃 ∈ ℝ𝑛×𝑛, introduce (𝜃, 𝜂) aslog𝑝𝑖𝑗 =

𝑘≤𝑖

𝑙≤𝑗𝜃𝑘𝑙, 𝜂𝑖𝑗 =

𝑘≥𝑖

𝑙≥𝑗𝑝𝑘𝑙

• Let 𝐼 = {11, 12,… , 1𝑛, 21,… , 𝑛1}, 𝜽 = (𝜃𝑖)𝑇𝑖∈𝐼 , 𝜼 = (𝜂𝑖)𝑇𝑖∈𝐼• Using Fisher information matrix 𝐺 ∈ ℝ|𝐼|×|𝐼| s.t.𝑔(𝑖𝑗)(𝑘𝑙) = 𝜂max{𝑖,𝑘}max{𝑗,𝑙} − 𝜂𝑖𝑗𝜂𝑘𝑙, update formula is𝜽next = 𝜽 − 𝐺−1(𝜼 − 𝜼correct)

7/41

Page 17: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Results on Hessenberg MatrixN

umbe

r of i

tera

tion

100 500 5000100102104106

nnRu

nnin

g tim

e (s

ec.)

100 500 5000

10410210010–2

106108

> x1000faster

NaturalBNEWTSinkhorn

8/41

Page 18: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Introduce Partial Order Structure

p₁₁ p₁₂ p₁₃

p₂₁ p₂₂ p₂₃

p₃₁ p₃₂ p₃₃

s(p(s), θₛ, ηₛ)

9/41

Page 19: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Partially Ordered Sets (Posets)

{a,b,c}

{a,b} {a,c} {b,c}

{a} {b} {c}

{a,b,c}

{a,b} {a,c} {b,c}

{a} {b} {c}

2{a,b,c} ℕ² Network

10/41

Page 20: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Incidence Algebra• Incidence algebra is defined over a poset (𝑆,≤)

– (Closed) Interval [𝑎, 𝑏] = {𝑠 ∈ 𝑆 ∣ 𝑎 ≤ 𝑠 ≤ 𝑏}

• Members of the incidence algebra arefunctions 𝑓(𝑎, 𝑏) from intervals [𝑎, 𝑏] to a scalar with(𝑓 + 𝑔)(𝑎, 𝑏) = 𝑓(𝑎, 𝑏) + 𝑔(𝑎, 𝑏)

(𝑓𝑔)(𝑎, 𝑏) =∑

𝑎≤𝑥≤𝑏𝑓(𝑎, 𝑥)𝑔(𝑥, 𝑏) (convolution)

11/41

Page 21: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Analogy to Matrix Multiplication• For [𝑎, 𝑏], define 𝒇,𝒈 ∈ ℝ|[𝑎,𝑏]| as

𝒇 =⎡⎢⎣

𝑓(𝑎, 𝑎)⋮

𝑓(𝑎, 𝑏)

⎤⎥⎦, 𝒈 =

⎡⎢⎣

𝑔(𝑎, 𝑎)⋮

𝑔(𝑏, 𝑎)

⎤⎥⎦

• For 𝑓 and 𝑔,(𝑓𝑔)(𝑎, 𝑏) = 𝒇𝑇𝒈

12/41

Page 22: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Special Elements• Delta function 𝛿:

𝛿(𝑎, 𝑏) = { 1 if 𝑎 = 𝑏0 otherwise

• Zeta function 𝜁: (integral)

𝜁(𝑎, 𝑏) = { 1 if 𝑎 ≤ 𝑏0 otherwise

• Möbius function 𝜇 = 𝜁−1: 𝜁𝜇 = 𝛿 (differential)13/41

Page 23: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Möbius Inversion Formula• Given a poset 𝑆, for any functions 𝑓, 𝑔 ∶ 𝑆 → ℝ,the Möbius inversion formula is given as⎧⎪

⎨⎪⎩

𝑔(𝑥) =∑

𝑠∈𝑆𝜁(𝑠, 𝑥)𝑓(𝑠) =

𝑠≤𝑆𝜁(𝑠, 𝑥)𝑓(𝑠) =

𝑠≤𝑥𝑓(𝑠)

𝑓(𝑥) =∑

𝑠∈𝑆𝜇(𝑠, 𝑥)𝑔(𝑠) =

𝑠≤𝑆𝜇(𝑠, 𝑥)𝑔(𝑠)

14/41

Page 24: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

E.g.1: Inclusion-Exclusion Principle

A∩B∩C

A∪B∪C

A∩B A∩C B∩C

A B C

A∩B∩C

A∪B∪C

A∩B A∩C B∩C

A B C f (X) = |X| g(X) = |X \ ⋃ Y|Y ⊂ X

f (X) = ∑Y≤X g(X)g(X) = ∑Y≤X μ(Y,X) f (Y)

|A∪B∪C|= |A|+|B|+|C|–|A∪B|–|A∪C|–|B∪C|+|A∩B∩C|

15/41

Page 25: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

E.g.2: Divisibility• Divisibility poset: 𝑎 ≤ 𝑏 ⇐⇒ 𝑏|𝑎 (𝑎 divides 𝑏)• The Möbius function: 𝑛 = 𝑏∕𝑎 and

𝜇(𝑛) = { (−1)𝑘 if 𝑛 = 𝑝1𝑝2…𝑝𝑘 for 𝑘 distinct primes0 otherwise

– 𝜇(𝑎, 𝑏) = 𝜇(𝑏|𝑎), the Möbius function in number theory• The Riemann zeta function 𝜁 is given by1∕𝜁(𝑠) =

∑∞

𝑛=1𝜇(𝑛)∕𝑛𝑠

16/41

Page 26: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Log-Linear Model on Poset [ICML2017]

• For probability 𝑝∶𝑆 → (0, 1) with∑𝑥∈𝑆 𝑝(𝑥) = 1,introduce 𝜽 and 𝜼 as𝜃𝑥 =

∑𝑠∈𝑆

𝜇(𝑠, 𝑥) log𝑝(𝑠),

𝜂𝑥 =∑

𝑠∈𝑆𝜁(𝑥, 𝑠)𝑝(𝑠) =

∑𝑠≥𝑥

𝑝(𝑠)

• From the Möbius inversion formula, log-linear model is:log𝑝(𝑥) =

∑𝑠∈𝑆

𝜁(𝑠, 𝑥)𝜃𝑠 =∑

𝑠≤𝑥𝜃𝑠

17/41

Page 27: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Log-Linear Model on Poset

x

(p(x), θx, ηx)

18/41

Page 28: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Log-Linear Model on Poset

x

(p(x), θx, ηx)

log p(x) = ∑s≤x θs

18/41

Page 29: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Log-Linear Model on Poset

x

(p(x), θx, ηx)

log p(x) = ∑s≤x θs

ηx = ∑s≥x p(s)

18/41

Page 30: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Exponential Family• The log-linear model on posets belongs tothe exponential family

• 𝜽 : Natural parameter• 𝜼 : Expectation parameter

19/41

Page 31: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Binary Log-Linear Model [AAAI2019](= Boltzmann machine)

{a,b,c}

{a,b} {a,c} {b,c}

{a} {b} {c}

{a,b,c}

{a,b} {a,c} {b,c}

{a} {b} {c}

2{a,b,c}

20/41

Page 32: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Binary Log-Linear Model [AAAI2019](= Boltzmann machine)

{a,b,c}

{a,b} {a,c} {b,c}

{a} {b} {c}

{a,b,c}

{a,b} {a,c} {b,c}

{a} {b} {c}

2{a,b,c}2{a,b,c}

{a,b,c}

{a,b} {a,c}

{a} {b}

{a,b,c}

{a,b} {a,c}

{a} {b}

{b,c}

{c}{c}

log p(x) = ∑s≤x θs

ηx = ∑s≥x p(s)

20/41

Page 33: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Binary Log-Linear Model [AAAI2019](= Boltzmann machine)

000

100

110 101

111

000

100

110 101

111{0, 1}3

011011

001001010010

21/41

Page 34: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Binary Log-Linear Model [AAAI2019](= Boltzmann machine)

000

100

110 101

111

000

100

110 101

111{0, 1}3

011011

001001010010

log p(x) = ∑s≤x θs

= –ψ + ∑i θixi + ∑i,jθijxixj + ···

For x with xi₁ = ··· = xik = 1,ηx = ∑s≥x p(s)

= �[xi₁···xik] = Pr(xi₁ = ··· = xik = 1)000

100

110 101

111

000

100

110 101

111{0, 1}3

011

001001010010

21/41

Page 35: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Dually Flat Structure• Let 𝜓(𝜃) = −𝜃(⊥) (convex, partition function)

𝜓(𝜃)Legendre transformation,,,,,,,,,,,,,,,,,,,,,,→ 𝜙(𝜂) =

𝑥∈𝑆𝑝(𝑥) log𝑝(𝑥)

• (𝜓(𝜃), 𝜙(𝜂)) leads to dually flat coordinate system (𝜃, 𝜂):

∇𝜓(𝜃) = 𝜂, 𝜕𝜕𝜃𝑥

𝜓(𝜃) = 𝜂𝑥

∇𝜙(𝜂) = 𝜃, 𝜕𝜕𝜂𝑥

𝜙(𝜂) = 𝜃𝑥22/41

Page 36: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Riemannian Metric (Fisher Information)

𝜕𝜕𝜃𝑥

𝜕𝜕𝜃𝑦

𝜓(𝜃) = 𝜕𝜕𝜃𝑥

𝜂𝑦 =∑

𝑠∈𝑆𝜁(𝑥, 𝑠)𝜁(𝑦, 𝑠)𝑝(𝑠) − 𝜂𝑥𝜂𝑦

𝜕𝜕𝜂𝑥

𝜕𝜕𝜂𝑦

𝜙(𝜂) = 𝜕𝜕𝜂𝑥

𝜃𝑦 =∑

𝑠∈𝑆𝜇(𝑠, 𝑥)𝜇(𝑠, 𝑦)𝑝(𝑠)−1

𝔼𝑠 [𝜕𝜕𝜃𝑥

log𝑝(𝑠) 𝜕𝜕𝜂𝑦

log𝑝(𝑠)] = 𝛿(𝑥, 𝑦)

23/41

Page 37: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Riemannian Metric (Fisher Information)

𝔼𝑠 [𝜕𝜕𝜃𝑥

log𝑝(𝑠) 𝜕𝜕𝜃𝑦

log𝑝(𝑠)] =∑

𝑠∈𝑆𝜁(𝑥, 𝑠)𝜁(𝑦, 𝑠)𝑝(𝑠) − 𝜂𝑥𝜂𝑦

𝔼𝑠 [𝜕𝜕𝜂𝑥

log𝑝(𝑠) 𝜕𝜕𝜂𝑦

log𝑝(𝑠)] =∑

𝑠∈𝑆𝜇(𝑠, 𝑥)𝜇(𝑠, 𝑦)𝑝(𝑠)−1

𝔼𝑠 [𝜕𝜕𝜃𝑥

log𝑝(𝑠) 𝜕𝜕𝜂𝑦

log𝑝(𝑠)] = 𝛿(𝑥, 𝑦)

24/41

Page 38: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Mixed Coordinate System• Many problems are formulated as coordinate mixing

P = ( θ1, θ2, ..., θi–1, θi, θi+1, ..., θₙ )

Q = ( η1, η2, ..., ηi–1, θi, θi+1, ..., θₙ )

R = ( η1, η2, ..., ηi–1, ηi, ηi+1, ..., ηₙ )

e-projection(MLE)m-projection

Pythagorean theorem:KL(P, R) = KL(P, Q) + KL(Q, R)

(Q is always unique)

25/41

Page 39: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Mixed Coordinate System (Example)• Many problems are formulated as coordinate mixing

P = ( , , ..., , , , ..., )

Q = ( η1, η2, ..., ηi–1, , , ..., )

R = ( η1, η2, ..., ηi–1, ηi, ηi+1, ..., ηₙ )

Pythagorean theorem:KL(P, R) = KL(P, Q) + KL(Q, R)

(Q is always unique)

0 0 0 0 0 0

0 0 0

Uniform dist.

Empirical dist.

26/41

Page 40: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Two Submanifolds

P = (θ1,θ2,...,θi–1,θi,θi+1,...,θₙ)

R = (η1,η2,...,ηi–1,ηi,ηi+1,...,ηₙ)

Q = (η1,η2,...,ηi–1,θi,θi+1,...,θₙ)

Fix

Fix

e-projection (Model manifold)

(Data manifold)27/41

Page 41: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Gradient methods for e-projection• e-projection is convex optimization• Gradient descent (first-order):𝜃next, 𝑖 ← 𝜃𝑖 − 𝜖(𝜂𝑖 − �̂�target, 𝑖)

• Natural gradient (second-order)𝜽next ← 𝜽 − 𝐺−1(𝜼 − 𝜼target)– 𝐺 is Fisher information matrix w.r.t. 𝜃

• Coordinate descent [IBIS2019]28/41

Page 42: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Boltzmann Machine Training

00001000 0100

1100

1110

1111

00001000 0100

1100 10101010

1110

1111Boltzmannmachine

Samplespace

29/41

Page 43: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Boltzmann Machine Training

00001000 0100

1100

1110

1111

00001000 0100

1100 10101010

1110

1111Boltzmannmachine

Samplespace θ = 0

η = η

29/41

Page 44: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Matrix Balancing

θ = θ

η = 3 – i + 1

3x3 matrixas poset:

30/41

Page 45: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Legendre Decomposition [NeurIPS 2018]

3x3x3 tensoras poset:

31/41

Page 46: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Legendre Decomposition [NeurIPS 2018]

3x3x3 tensoras poset:

θ = 0η = η

31/41

Page 47: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Introducing Hidden Variables in BM

00001000 0100

1100

1110

1111

00001000 0100

1100 10101010

1110

1111BM withhidden variables

Samplespace

32/41

Page 48: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Introducing Hidden Variables in BM

00001000 0100

1100

1110

1111

00001000 0100

1100 10101010

1110

1111BM withhidden variables

Samplespace η0101 = ?

32/41

Page 49: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Introducing Hidden Variables in BM

00001000 0100

1100

1110

1111

00001000 0100

1100 10101010

1110

1111BM withhidden variables

Samplespace θ = 0

η = η

Need to estimate ηvia EM nonconvexNeed to estimate ηvia EM nonconvex

32/41

Page 50: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Introducing Hidden States

00001000 0100

1100

1110

1111

00001000 0100

1100 10101010

1110

1111Boltzmannmachine

Samplespace

33/41

Page 51: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Introducing Hidden States

00001000 0100

1100

1110

1111

00001000 0100

1100 10101010

1110

1111Boltzmannmachine

Samplespace

Hiddenstates

34/41

Page 52: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Introducing Hidden States

00001000 0100

1100

1110

1111

00001000 0100

1100 10101010

1110

1111Boltzmannmachine

Samplespace

Hiddenstates

Optimization isstill convex!

ηx

x

35/41

Page 53: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Introducing Hidden States

00001000 0100

1100

1110

1111

00001000 0100

1100 10101010

1110

1111Boltzmannmachine

Samplespace θ = 0

η = η

36/41

Page 54: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Blind Source Separation [arXiv]

SourceMixing Received

e.g. image

37/41

Page 55: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Blind Source Separation [arXiv]

SourceMixing Received

e.g. image

θ = 0η = η37/41

Page 56: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Results of BSS

0.270320.436300.621950.37167

IGBSSFastICANMFDicLearn

Source

Received

Reconst.

Method RMSE

38/41

Page 57: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Relationship to Homology• Möbius function is Euler characteristics• Consider order complex ∆(𝑆) of a poset 𝑆 with ⊥,⊤ ∈ 𝑆

(i) Vertices of ∆(𝑆) are elements of 𝑆(ii) Faces of ∆(𝑆) are chains of 𝑆

• For the Euler characteristic 𝜒(∆(𝑆)),𝜇(⊥,⊤) = 𝜒(∆(𝑆)) + 1– Two spaces are homotopy equivalent⇒ Euler characteristics are the same

39/41

Page 58: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Summary: Recipe for Poset-LogLinear1. Treat the target as a poset

2. Introduce the log-linear model on poset

3. Formulate the objective as coordinate mixing

4. Solve it by a gradient method

(This slide is at https://mahito.nii.ac.jp/)

40/41

Page 59: 隣接代数と双対平坦構造を 用いた学習...3x3x3 tensor as poset: = 0 = 31/41 IntroducingHiddenVariablesinBM 0000 10000100 1100 1110 1111 0000 10000100 110010101010 1110

Acknowledgment• Tsuda, K. (UTokyo), Nakahara, H. (RIKEN CBS)• Yamada, R. (KyotoU), Mimura, K. (HiroshimaCU)• Luo, S., Azizi, L. (USydney)• Hayashi, S., Matsushima, S. (UTokyo)• Borgwardt, K. and his lab members (ETH Zürich)• Lab members at NII

41/41