A Video from Google DeepMind - GitHub Pageswangleiphy.github.io/talks/RBM_Seattle.pdfRecommender Systems Learn preferences Recommendations Restricted Boltzmann Machine E (x, h)= XN

A Video from Google DeepMind

http://www.nature.com/nature/journal/v518/n7540/fig_tab/nature14236_SV2.html

http://www.nature.com/nature/journal/v518/n7540/fig_tab/nature14236_SV2.html

Can machine learning teach us cluster updates ?

Lei Wang Institute of Physics, CAS https://wangleiphy.github.io

Li Huang and LW, 1610.02746 LW, 1702.08586

https://wangleiphy.github.io

Local vs Cluster algorithms


is slower than


Algorithmic innovation outperforms Moore’s law!

Recommender Systems

Learn preferences

Recommendations

Recommender Systems

Learn preferences

Recommendations

Restricted Boltzmann Machine

E (x,h) = �NX

i=1

aixi �MX

j=1

bjhj �NX

i=1

MX

j=1

xiWijhj

x 2 {0, 1}N

h 2 {0, 1}M

Energy based model

x1 x2 . . .

xN

h1 h2 h3 . . .

hM

Smolensky 1986 Hinton and Sejnowski 1986


E (x,h) = �NX

i=1

aixi �MX

j=1

bjhj �NX

i=1

MX

j=1

xiWijhj

x 2 {0, 1}N

h 2 {0, 1}M

Energy based model

x1 x2 . . .

xN

h1 h2 h3 . . .

hM



E (x,h) = �NX

i=1

aixi �MX

j=1

bjhj �NX

i=1

MX

j=1

xiWijhj

x 2 {0, 1}N

h 2 {0, 1}M

Energy based model

p(x) =X

h

p(x,h)

p(x,h) ⇠ e�E(x,h)

p(h|x) = p(x,h)/p(x)

Probabilities

x1 x2 . . .

xN

h1 h2 h3 . . .

hM



E (x,h) = �NX

i=1

aixi �MX

j=1

bjhj �NX

i=1

MX

j=1

xiWijhj

x 2 {0, 1}N

h 2 {0, 1}M

Energy based model

p(x) =X

h

p(x,h)

p(x,h) ⇠ e�E(x,h)

p(h|x) = p(x,h)/p(x)

Probabilities

x1 x2 . . .

xN

h1 h2 h3 . . .

hM

Universal approximator of probability distributions . Freund and Haussler, 1989 . Le Roux and Bengio, 2008


Generative LearningLearning: Fit to the target distribution

Generating: Blocked Gibbs sampling

Hinton 2002Learn the model from data

Generate more data from the learned model

x

h

x

0

p(h|x) p(x0|h)

x

h

h

0

x

0

p(h|x) min

⇥1,

p(h0)p(h)

⇤p(x0|h0)

target probability p(x) ⇠ ⇡(x)

MNIST database of handwritten digits

learned weight Wij

http://www.deeplearning.net/tutorial/rbm.html#rbmTheano deep learning tutorial

http://www.deeplearning.net/tutorial/rbm.html#rbm


learned weight Wij


784 100x1 x2 . . .

xN

h1 h2 h3 . . .

hM



learned weight Wij


784 100x1 x2 . . .

xN

h1 h2 h3 . . .

hM


Which one is written by human ?

Good$Genera=ve$Model?$MNIST$HandwriEen$Digit$Dataset$

Challenge

Salakhutdinov, 2015 Deep learning summer school

Idea

Model the QMC data with an RBM, then sample !om the RBM

Li Huang and LW, 1610.02746

cf. Liu, Qi, Meng, Fu,1610.03137Liu, Shen, Qi, Meng, Fu,1611.09364

Xu, Qi, Liu, Fu, Meng,1612.03804

Why is it useful ?• When the fitting is perfect, we can completely

bypass the “quantum” part of the QMC

• Even with an imperfect fitting, the RBM can still guide the QMC sampling

RBM is a recommender system for the QMC simulations

cf. Torlai, Melko,1606.02718

Why is it useful ?• When the fitting is perfect, we can completely

bypass the “quantum” part of the QMC

• Even with an imperfect fitting, the RBM can still guide the QMC sampling

Bonus: it is also fun to see what it discovers for the physical model

RBM is a recommender system for the QMC simulations

cf. Torlai, Melko,1606.02718

Learned Weight

of a 8*8 Falicov-Kimball model

Accept or not?

The art of Monte Carlo methods

A(x ! x

0) = min

1,

T (x0 ! x)

T (x ! x

0)

⇡(x0)

⇡(x)

�

Sample the RBM, and propose the move to QMCx

h

x

0

p(h|x) p(x0|h)

x

h

h

0

x

0

p(h|x) min

⇥1,

p(h0)p(h)

⇤p(x0|h0)

Accept or not?

A(x ! x

0) = min

1,

T (x0 ! x)

T (x ! x

0)

⇡(x0)

⇡(x)

�

Sample the RBM, and propose the move to QMC

T (x0 ! x)

T (x ! x

0)=

p(x)

p(x0)Detailed balance condition for the RBM

x

h

x

0

p(h|x) p(x0|h)

x

h

h

0

x

0

p(h|x) min

⇥1,

p(h0)p(h)

⇤p(x0|h0)

Accept or not?

A(x ! x

0) = min

1,

p(x)

p(x0)· ⇡(x

0)

⇡(x)

�Li Huang and LW, 1610.02746

Acceptance rate of recommended update

R. M. Neal, Bayesian learning for neural networks, 1996 . J. S. Liu, Monte Carlo strategies in scientific computing, 2008

S. Zhang, Auxiliary-field QMC for correlated electron systems, 2013

“surrogate function”

“force bias”

RBM Physical model

Can Boltzmann Machines Discover Cluster Updates ?

LW, 1702.08586

Question

Cluster Update in a Nutshell

Swendsen and Wang,1987

Cluster Update in a NutshellSample auxiliary bond variables


Cluster Update in a NutshellBuild clusters


Cluster Update in a NutshellFlip clusters randomly


Cluster Update in a Nutshell

Rejection free cluster update!


Ising spins Visible units

Auxiliary bond variables Hidden units

Build clusters Sample h given s

Flip clusters Sample s given h

Swendsen-Wang Boltzmann Machine

flip clusters

p(h|s)h` = 0, 1si = ±1

visible hidden

Boltzmann Machine for cluster update

E(s,h) = �X

`

(Wsi`sj` + b)h`

si` sj`

h` = 1h 2 {0, 1}|Edge|

s 2 {�1,+1}|Vertex|


E(s,h) = �X

`

(Wsi`sj` + b)h`

si` sj`

h` = 0h 2 {0, 1}|Edge|

s 2 {�1,+1}|Vertex|


E(s,h) = �X

`

(Wsi`sj` + b)h`

si` sj`

h` = 0h 2 {0, 1}|Edge|

s 2 {�1,+1}|Vertex|

Cluster Update in the BM language

Sample hidden variables given visible units


Inactive hidden units break the links


Randomly flip clusters of visible units


Voila!


E(s,h) = �X

`

(Wsi`sj` + b)h`

si` sj`

h` = 0h 2 {0, 1}|Edge|

s 2 {�1,+1}|Vertex|

• Rejection free Monte Carlo updates when 1 + eb+W

1 + eb�W= e2�J


E(s,h) = �X

`

(Wsi`sj` + b)h`

si` sj`

h` = 0h 2 {0, 1}|Edge|

s 2 {�1,+1}|Vertex|

• Encompass general cluster algorithm frameworksNiedermeyer,1988 Kandel and Domany,1991 Kawashima and Gubernatis,1995


1 + eb�W= e2�J


E(s,h) = �X

`

(Wsi`sj` + b)h`

si` sj`

h` = 0h 2 {0, 1}|Edge|

s 2 {�1,+1}|Vertex|

• Encompass general cluster algorithm frameworks

• In generalNiedermeyer,1988 Kandel and Domany,1991 Kawashima and Gubernatis,1995

“feature” E(s,h) = E(s)�

X

↵

[W↵F↵(s) + b↵]h↵


1 + eb�W= e2�J


• Boltzmann Machines are learnable so they can adapt to various physical problems

• Boltzmann Machines parametrize Monte Carlo policies which can be optimized for efficiency

• The hidden units learn to play smart roles: Fortuin-Kasteleyn and Hubbard-Stratonovich transformations

Generate

Learn

Plaquette Ising model

K —1610.03137“However, for K≠0, NO simple and efficient global update method is known.”


K e�Ks1s2s3s4 !X

h

eW (s1s2+s3s4)h


K

• Given s, sample h on each plaquette independently

• Given h, the Boltzmann Machine is an ordinary Ising model with modulated interactions, which can be sampled efficiently

• Rejection free cluster update!

e�Ks1s2s3s4 !X

h

eW (s1s2+s3s4)h


K

LW, 1702.08586

e�Ks1s2s3s4 !X

h

eW (s1s2+s3s4)h

• New tools for (quantum) many-body problems

• Moreover, it offers a new way of thinking

• Can we make new scientific discovery with it ?

• Can one design better algorithms with it ?

Machine learning for many-body physics

LW, 1606.00318 Li Huang and LW, 1610.02746

Li Huang, Yi-feng Yang and LW, 1612.01871LW, 1702.08586

• New tools for (quantum) many-body problems

• Moreover, it offers a new way of thinking

• Can we make new scientific discovery with it ?

• Can one design better algorithms with it ?

Machine learning for many-body physics

LW, 1606.00318 Li Huang and LW, 1610.02746

Li Huang, Yi-feng Yang and LW, 1612.01871LW, 1702.08586 The fun just starts!

Quantum many-body physics for ML

��

�� " 7�� &�� 0� � �� %�� ' �� .!�� @�� * �� * ��

:� �� ;�� ,� ��

��

��

�� ? �� AF�:) �� )�� ,F�:)- �� F�� :�� )�� )�� ,A- �� , ��1��- �� )�� AF�:) �� 4%? �� )�� 1�� <�� <�� !��0�� C�� ,�� - �� H��

��

MNIST database random imagesfrom the “Deep Learning” book by Goodfellow, Bengio, Courville https://www.deeplearningbook.org/

Quantum entanglement perspective on deep learningJing Chen, Song Cheng, Haidong Xie, LW, and Tao Xiang, 1701.04831

Dong-Ling Deng, Xiaopeng Li and S. Das Sarma, 1701.04844

Xun Gao, L.-M. Duan, 1701.05039 Y. Huang and J. E. Moore, 1701.06246

A(x ! x

0) = min

1,

p(x)

p(x0)· ⇡(x

0)

⇡(x)

�

RBM Physical model

e.g. Falikov-Kimball (classical field + fermions) model

while for the RBM

ln[p(x)] =NX

i=1

a

i

x

i

+MX

j=1

ln⇣1 + e

bj+PN

i=1 xiWij

⌘

Hij = Kij + �ijU (xi � 1/2)

x1 x2 . . .

xN

h1 h2 h3 . . .

hM

ln[⇡(x)] =�U

2

NX

i=1

xi + ln deth1 + e

��H(x)i

x 2 {0, 1}N

h 2 {0, 1}M

Train the RBM

NX

i=1

a

i

x

i

+MX

j=1

ln⇣1 + e

bj+PN

i=1 xiWij

⌘

=�U

2

NX

i=1

xi + ln det�1 + e

��H�

Supervised learning of the RBM

... ...

F (x)

x1

x2

xN

p(x) ⇠ ⇡(x)

Documents

A Video from Google DeepMind - GitHub Pageswangleiphy.github.io/talks/RBM_Seattle.pdfRecommender Systems Learn preferences Recommendations Restricted Boltzmann Machine E (x, h)= XN