88
Restricted Boltzmann Machines for Collaborative Filtering Ruslan Salakhutdinov Andriy Mnih Geoffrey Hinton November 29, 2016 Binglin Chen RBM for Collaborative Filtering November 29, 2016 1 / 22

Restricted Boltzmann Machines for Collaborative Filteringswoh.web.engr.illinois.edu/courses/IE598/handout/fall... · 2016-12-03 · Restricted Boltzmann Machines for Collaborative

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Restricted Boltzmann Machines for CollaborativeFiltering

Ruslan Salakhutdinov Andriy Mnih Geoffrey Hinton

November 29, 2016

Binglin Chen RBM for Collaborative Filtering November 29, 2016 1 / 22

Collaborative Filtering

Wikipedia:

In the newer, narrower sense, collaborative filtering is a method ofmaking automatic predictions (filtering) about the interests of a userby collecting preferences or taste information from many users(collaborating).

Fundamental ideas:

If two items get similar rating patterns then they are probably similarIf two users rated items in a similar fashion, then they will probablygive similar ratings to an unrated itemProperties of items are unknown

Applications:

Amazon (Customers Who Bought This Item Also Bought)NetflixSpotify

Binglin Chen RBM for Collaborative Filtering November 29, 2016 2 / 22

Collaborative Filtering

Wikipedia:

In the newer, narrower sense, collaborative filtering is a method ofmaking automatic predictions (filtering) about the interests of a userby collecting preferences or taste information from many users(collaborating).

Fundamental ideas:

If two items get similar rating patterns then they are probably similar

If two users rated items in a similar fashion, then they will probablygive similar ratings to an unrated itemProperties of items are unknown

Applications:

Amazon (Customers Who Bought This Item Also Bought)NetflixSpotify

Binglin Chen RBM for Collaborative Filtering November 29, 2016 2 / 22

Collaborative Filtering

Wikipedia:

In the newer, narrower sense, collaborative filtering is a method ofmaking automatic predictions (filtering) about the interests of a userby collecting preferences or taste information from many users(collaborating).

Fundamental ideas:

If two items get similar rating patterns then they are probably similarIf two users rated items in a similar fashion, then they will probablygive similar ratings to an unrated item

Properties of items are unknown

Applications:

Amazon (Customers Who Bought This Item Also Bought)NetflixSpotify

Binglin Chen RBM for Collaborative Filtering November 29, 2016 2 / 22

Collaborative Filtering

Wikipedia:

In the newer, narrower sense, collaborative filtering is a method ofmaking automatic predictions (filtering) about the interests of a userby collecting preferences or taste information from many users(collaborating).

Fundamental ideas:

If two items get similar rating patterns then they are probably similarIf two users rated items in a similar fashion, then they will probablygive similar ratings to an unrated itemProperties of items are unknown

Applications:

Amazon (Customers Who Bought This Item Also Bought)NetflixSpotify

Binglin Chen RBM for Collaborative Filtering November 29, 2016 2 / 22

Collaborative Filtering

Wikipedia:

In the newer, narrower sense, collaborative filtering is a method ofmaking automatic predictions (filtering) about the interests of a userby collecting preferences or taste information from many users(collaborating).

Fundamental ideas:

If two items get similar rating patterns then they are probably similarIf two users rated items in a similar fashion, then they will probablygive similar ratings to an unrated itemProperties of items are unknown

Applications:

Amazon (Customers Who Bought This Item Also Bought)

NetflixSpotify

Binglin Chen RBM for Collaborative Filtering November 29, 2016 2 / 22

Collaborative Filtering

Wikipedia:

In the newer, narrower sense, collaborative filtering is a method ofmaking automatic predictions (filtering) about the interests of a userby collecting preferences or taste information from many users(collaborating).

Fundamental ideas:

If two items get similar rating patterns then they are probably similarIf two users rated items in a similar fashion, then they will probablygive similar ratings to an unrated itemProperties of items are unknown

Applications:

Amazon (Customers Who Bought This Item Also Bought)Netflix

Spotify

Binglin Chen RBM for Collaborative Filtering November 29, 2016 2 / 22

Collaborative Filtering

Wikipedia:

In the newer, narrower sense, collaborative filtering is a method ofmaking automatic predictions (filtering) about the interests of a userby collecting preferences or taste information from many users(collaborating).

Fundamental ideas:

If two items get similar rating patterns then they are probably similarIf two users rated items in a similar fashion, then they will probablygive similar ratings to an unrated itemProperties of items are unknown

Applications:

Amazon (Customers Who Bought This Item Also Bought)NetflixSpotify

Binglin Chen RBM for Collaborative Filtering November 29, 2016 2 / 22

The Task: Netflix Prize

Basic statistics:

480,189 users, <CustomerID>17,770 movies, <MovieID, YearOfRelease, Title>

Training set:

100,480,507 ratings, <CustomerID, MovieID, Rating, Date>Rating is an integer that ranges from 1 to 5

Qualifying set:

2,817,131 ratings, <CustomerID, MovieID, Date>

Sparsity of ratings:100, 480, 507 + 2, 817, 131

480, 189× 17, 770= 0.0121 = 1.21%

Binglin Chen RBM for Collaborative Filtering November 29, 2016 3 / 22

The Task: Netflix Prize

Basic statistics:

480,189 users, <CustomerID>17,770 movies, <MovieID, YearOfRelease, Title>

Training set:

100,480,507 ratings, <CustomerID, MovieID, Rating, Date>Rating is an integer that ranges from 1 to 5

Qualifying set:

2,817,131 ratings, <CustomerID, MovieID, Date>

Sparsity of ratings:100, 480, 507 + 2, 817, 131

480, 189× 17, 770= 0.0121 = 1.21%

Binglin Chen RBM for Collaborative Filtering November 29, 2016 3 / 22

The Task: Netflix Prize

Basic statistics:

480,189 users, <CustomerID>17,770 movies, <MovieID, YearOfRelease, Title>

Training set:

100,480,507 ratings, <CustomerID, MovieID, Rating, Date>Rating is an integer that ranges from 1 to 5

Qualifying set:

2,817,131 ratings, <CustomerID, MovieID, Date>

Sparsity of ratings:100, 480, 507 + 2, 817, 131

480, 189× 17, 770= 0.0121 = 1.21%

Binglin Chen RBM for Collaborative Filtering November 29, 2016 3 / 22

The Task: Netflix Prize

Basic statistics:

480,189 users, <CustomerID>17,770 movies, <MovieID, YearOfRelease, Title>

Training set:

100,480,507 ratings, <CustomerID, MovieID, Rating, Date>Rating is an integer that ranges from 1 to 5

Qualifying set:

2,817,131 ratings, <CustomerID, MovieID, Date>

Sparsity of ratings:100, 480, 507 + 2, 817, 131

480, 189× 17, 770= 0.0121 = 1.21%

Binglin Chen RBM for Collaborative Filtering November 29, 2016 3 / 22

Binary RBM

v1 v2 v3 vN

h1 h2 h3 hM

visible layer

hidden layer

v ∈ {0, 1}N

h ∈ {0, 1}M

W1,1 WM,N W ∈ RM×N

b1 b2 b3 bN

a1 a2 a3 aM

b ∈ RN

a ∈ RM

E(h,v) = −aTh− bTv − hTWv

p(h,v) =1

Zexp(−E(h,v))

Z =∑h,v

exp(−E(h,v))

Binglin Chen RBM for Collaborative Filtering November 29, 2016 4 / 22

Binary RBM

v1 v2 v3 vN

h1 h2 h3 hM

visible layer

hidden layer

v ∈ {0, 1}N

h ∈ {0, 1}M

W1,1 WM,N W ∈ RM×N

b1 b2 b3 bN

a1 a2 a3 aM

b ∈ RN

a ∈ RM

E(h,v) = −aTh− bTv − hTWv

p(h,v) =1

Zexp(−E(h,v))

Z =∑h,v

exp(−E(h,v))

Binglin Chen RBM for Collaborative Filtering November 29, 2016 4 / 22

Binary RBM

v1 v2 v3 vN

h1 h2 h3 hM

visible layer

hidden layer

v ∈ {0, 1}N

h ∈ {0, 1}M

W1,1 WM,N W ∈ RM×N

b1 b2 b3 bN

a1 a2 a3 aM

b ∈ RN

a ∈ RM

E(h,v) = −aTh− bTv − hTWv

p(h,v) =1

Zexp(−E(h,v))

Z =∑h,v

exp(−E(h,v))

Binglin Chen RBM for Collaborative Filtering November 29, 2016 4 / 22

Binary RBM

v1 v2 v3 vN

h1 h2 h3 hM

visible layer

hidden layer

v ∈ {0, 1}N

h ∈ {0, 1}M

W1,1 WM,N W ∈ RM×N

b1 b2 b3 bN

a1 a2 a3 aM

b ∈ RN

a ∈ RM

E(h,v) = −aTh− bTv − hTWv

p(h,v) =1

Zexp(−E(h,v))

Z =∑h,v

exp(−E(h,v))

Binglin Chen RBM for Collaborative Filtering November 29, 2016 4 / 22

Binary RBM

v1 v2 v3 vN

h1 h2 h3 hM

visible layer

hidden layer

v ∈ {0, 1}N

h ∈ {0, 1}M

W1,1 WM,N W ∈ RM×N

b1 b2 b3 bN

a1 a2 a3 aM

b ∈ RN

a ∈ RM

E(h,v) = −aTh− bTv − hTWv

p(h,v) =1

Zexp(−E(h,v))

Z =∑h,v

exp(−E(h,v))

Binglin Chen RBM for Collaborative Filtering November 29, 2016 4 / 22

Binary RBM

v1 v2 v3 vN

h1 h2 h3 hM

visible layer

hidden layer

v ∈ {0, 1}N

h ∈ {0, 1}M

W1,1 WM,N W ∈ RM×N

b1 b2 b3 bN

a1 a2 a3 aM

b ∈ RN

a ∈ RM

E(h,v) = −aTh− bTv − hTWv

p(h,v) =1

Zexp(−E(h,v))

Z =∑h,v

exp(−E(h,v))

Binglin Chen RBM for Collaborative Filtering November 29, 2016 4 / 22

Binary RBM, continued

p(h|v)

=p(h,v)

p(v)

=1

p(v)

1

Zexp(aTh+ bTv + hTWv)

=1

p(v)

1

Zexp(bTv) exp(aTh+ hTWv)

=1

Z ′exp(aTh+ hTWv)

=1

Z ′exp

(M∑i=1

aihi +

M∑i=1

hiW i,:v

)

=1

Z ′exp

(M∑i=1

hi(ai +W i,:v)

)

=1

Z ′

M∏i=1

exp (hi(ai +W i,:v))

Binglin Chen RBM for Collaborative Filtering November 29, 2016 5 / 22

Binary RBM, continued

p(h|v) = p(h,v)

p(v)

=1

p(v)

1

Zexp(aTh+ bTv + hTWv)

=1

p(v)

1

Zexp(bTv) exp(aTh+ hTWv)

=1

Z ′exp(aTh+ hTWv)

=1

Z ′exp

(M∑i=1

aihi +

M∑i=1

hiW i,:v

)

=1

Z ′exp

(M∑i=1

hi(ai +W i,:v)

)

=1

Z ′

M∏i=1

exp (hi(ai +W i,:v))

Binglin Chen RBM for Collaborative Filtering November 29, 2016 5 / 22

Binary RBM, continued

p(h|v) = p(h,v)

p(v)

=1

p(v)

1

Zexp(aTh+ bTv + hTWv)

=1

p(v)

1

Zexp(bTv) exp(aTh+ hTWv)

=1

Z ′exp(aTh+ hTWv)

=1

Z ′exp

(M∑i=1

aihi +

M∑i=1

hiW i,:v

)

=1

Z ′exp

(M∑i=1

hi(ai +W i,:v)

)

=1

Z ′

M∏i=1

exp (hi(ai +W i,:v))

Binglin Chen RBM for Collaborative Filtering November 29, 2016 5 / 22

Binary RBM, continued

p(h|v) = p(h,v)

p(v)

=1

p(v)

1

Zexp(aTh+ bTv + hTWv)

=1

p(v)

1

Zexp(bTv) exp(aTh+ hTWv)

=1

Z ′exp(aTh+ hTWv)

=1

Z ′exp

(M∑i=1

aihi +

M∑i=1

hiW i,:v

)

=1

Z ′exp

(M∑i=1

hi(ai +W i,:v)

)

=1

Z ′

M∏i=1

exp (hi(ai +W i,:v))

Binglin Chen RBM for Collaborative Filtering November 29, 2016 5 / 22

Binary RBM, continued

p(h|v) = p(h,v)

p(v)

=1

p(v)

1

Zexp(aTh+ bTv + hTWv)

=1

p(v)

1

Zexp(bTv) exp(aTh+ hTWv)

=1

Z ′exp(aTh+ hTWv)

=1

Z ′exp

(M∑i=1

aihi +

M∑i=1

hiW i,:v

)

=1

Z ′exp

(M∑i=1

hi(ai +W i,:v)

)

=1

Z ′

M∏i=1

exp (hi(ai +W i,:v))

Binglin Chen RBM for Collaborative Filtering November 29, 2016 5 / 22

Binary RBM, continued

p(h|v) = p(h,v)

p(v)

=1

p(v)

1

Zexp(aTh+ bTv + hTWv)

=1

p(v)

1

Zexp(bTv) exp(aTh+ hTWv)

=1

Z ′exp(aTh+ hTWv)

=1

Z ′exp

(M∑i=1

aihi +

M∑i=1

hiW i,:v

)

=1

Z ′exp

(M∑i=1

hi(ai +W i,:v)

)

=1

Z ′

M∏i=1

exp (hi(ai +W i,:v))

Binglin Chen RBM for Collaborative Filtering November 29, 2016 5 / 22

Binary RBM, continued

p(h|v) = p(h,v)

p(v)

=1

p(v)

1

Zexp(aTh+ bTv + hTWv)

=1

p(v)

1

Zexp(bTv) exp(aTh+ hTWv)

=1

Z ′exp(aTh+ hTWv)

=1

Z ′exp

(M∑i=1

aihi +

M∑i=1

hiW i,:v

)

=1

Z ′exp

(M∑i=1

hi(ai +W i,:v)

)

=1

Z ′

M∏i=1

exp (hi(ai +W i,:v))

Binglin Chen RBM for Collaborative Filtering November 29, 2016 5 / 22

Binary RBM, continued

p(h|v) = p(h,v)

p(v)

=1

p(v)

1

Zexp(aTh+ bTv + hTWv)

=1

p(v)

1

Zexp(bTv) exp(aTh+ hTWv)

=1

Z ′exp(aTh+ hTWv)

=1

Z ′exp

(M∑i=1

aihi +

M∑i=1

hiW i,:v

)

=1

Z ′exp

(M∑i=1

hi(ai +W i,:v)

)

=1

Z ′

M∏i=1

exp (hi(ai +W i,:v))

Binglin Chen RBM for Collaborative Filtering November 29, 2016 5 / 22

Binary RBM, continued

p(h|v) = 1

Z ′

M∏i=1

exp (hi(ai +W i,:v))

p(hi|v) ∝ exp (hi(ai +W i,:v))

p(hi = 1|v) = p(hi = 1|v)p(hi = 0|v) + p(hi = 1|v)

=exp (ai +W i,:v)

1 + exp (ai +W i,:v)

= sigmoid (ai +W i,:v)

p(vj = 1|h) = sigmoid(bj + hTW :,j

)

Binglin Chen RBM for Collaborative Filtering November 29, 2016 6 / 22

Binary RBM, continued

p(h|v) = 1

Z ′

M∏i=1

exp (hi(ai +W i,:v))

p(hi|v) ∝ exp (hi(ai +W i,:v))

p(hi = 1|v) = p(hi = 1|v)p(hi = 0|v) + p(hi = 1|v)

=exp (ai +W i,:v)

1 + exp (ai +W i,:v)

= sigmoid (ai +W i,:v)

p(vj = 1|h) = sigmoid(bj + hTW :,j

)

Binglin Chen RBM for Collaborative Filtering November 29, 2016 6 / 22

Binary RBM, continued

p(h|v) = 1

Z ′

M∏i=1

exp (hi(ai +W i,:v))

p(hi|v) ∝ exp (hi(ai +W i,:v))

p(hi = 1|v) = p(hi = 1|v)p(hi = 0|v) + p(hi = 1|v)

=exp (ai +W i,:v)

1 + exp (ai +W i,:v)

= sigmoid (ai +W i,:v)

p(vj = 1|h) = sigmoid(bj + hTW :,j

)

Binglin Chen RBM for Collaborative Filtering November 29, 2016 6 / 22

Binary RBM, continued

p(h|v) = 1

Z ′

M∏i=1

exp (hi(ai +W i,:v))

p(hi|v) ∝ exp (hi(ai +W i,:v))

p(hi = 1|v) = p(hi = 1|v)p(hi = 0|v) + p(hi = 1|v)

=exp (ai +W i,:v)

1 + exp (ai +W i,:v)

= sigmoid (ai +W i,:v)

p(vj = 1|h) = sigmoid(bj + hTW :,j

)

Binglin Chen RBM for Collaborative Filtering November 29, 2016 6 / 22

Binary RBM, continued

p(h|v) = 1

Z ′

M∏i=1

exp (hi(ai +W i,:v))

p(hi|v) ∝ exp (hi(ai +W i,:v))

p(hi = 1|v) = p(hi = 1|v)p(hi = 0|v) + p(hi = 1|v)

=exp (ai +W i,:v)

1 + exp (ai +W i,:v)

= sigmoid (ai +W i,:v)

p(vj = 1|h) = sigmoid(bj + hTW :,j

)

Binglin Chen RBM for Collaborative Filtering November 29, 2016 6 / 22

Binary RBM, continued

p(h|v) = 1

Z ′

M∏i=1

exp (hi(ai +W i,:v))

p(hi|v) ∝ exp (hi(ai +W i,:v))

p(hi = 1|v) = p(hi = 1|v)p(hi = 0|v) + p(hi = 1|v)

=exp (ai +W i,:v)

1 + exp (ai +W i,:v)

= sigmoid (ai +W i,:v)

p(vj = 1|h) = sigmoid(bj + hTW :,j

)Binglin Chen RBM for Collaborative Filtering November 29, 2016 6 / 22

Binary RBM Learning

Binary RBM’s parameters: θ = {W ,a, b}

Maximum likelihood principle: argmaxθ p(v(1),v(2), . . . ,v(T )|θ)

No closed-form solution

Resort to gradient ascent

Binglin Chen RBM for Collaborative Filtering November 29, 2016 7 / 22

Binary RBM Learning

Binary RBM’s parameters: θ = {W ,a, b}Maximum likelihood principle: argmaxθ p(v

(1),v(2), . . . ,v(T )|θ)

No closed-form solution

Resort to gradient ascent

Binglin Chen RBM for Collaborative Filtering November 29, 2016 7 / 22

Binary RBM Learning

Binary RBM’s parameters: θ = {W ,a, b}Maximum likelihood principle: argmaxθ p(v

(1),v(2), . . . ,v(T )|θ)No closed-form solution

Resort to gradient ascent

Binglin Chen RBM for Collaborative Filtering November 29, 2016 7 / 22

Binary RBM Learning

Binary RBM’s parameters: θ = {W ,a, b}Maximum likelihood principle: argmaxθ p(v

(1),v(2), . . . ,v(T )|θ)No closed-form solution

Resort to gradient ascent

Binglin Chen RBM for Collaborative Filtering November 29, 2016 7 / 22

Binary RBM Learning, continued

`(θ)

=

T∑t=1

log p(v(t))

=

T∑t=1

log∑h

p(h,v(t))

=T∑t=1

log∑h

1

Zexp

(−E(h,v(t))

)=

T∑t=1

log1

Z

∑h

exp(−E(h,v(t))

)=

T∑t=1

log∑h

exp(−E(h,v(t))

)− T logZ

=T∑t=1

log∑h

exp(−E(h,v(t))

)− T log

∑h,v

exp(−E(h,v))

Binglin Chen RBM for Collaborative Filtering November 29, 2016 8 / 22

Binary RBM Learning, continued

`(θ) =

T∑t=1

log p(v(t))

=

T∑t=1

log∑h

p(h,v(t))

=T∑t=1

log∑h

1

Zexp

(−E(h,v(t))

)=

T∑t=1

log1

Z

∑h

exp(−E(h,v(t))

)=

T∑t=1

log∑h

exp(−E(h,v(t))

)− T logZ

=T∑t=1

log∑h

exp(−E(h,v(t))

)− T log

∑h,v

exp(−E(h,v))

Binglin Chen RBM for Collaborative Filtering November 29, 2016 8 / 22

Binary RBM Learning, continued

`(θ) =

T∑t=1

log p(v(t))

=

T∑t=1

log∑h

p(h,v(t))

=T∑t=1

log∑h

1

Zexp

(−E(h,v(t))

)=

T∑t=1

log1

Z

∑h

exp(−E(h,v(t))

)=

T∑t=1

log∑h

exp(−E(h,v(t))

)− T logZ

=T∑t=1

log∑h

exp(−E(h,v(t))

)− T log

∑h,v

exp(−E(h,v))

Binglin Chen RBM for Collaborative Filtering November 29, 2016 8 / 22

Binary RBM Learning, continued

`(θ) =

T∑t=1

log p(v(t))

=

T∑t=1

log∑h

p(h,v(t))

=

T∑t=1

log∑h

1

Zexp

(−E(h,v(t))

)

=T∑t=1

log1

Z

∑h

exp(−E(h,v(t))

)=

T∑t=1

log∑h

exp(−E(h,v(t))

)− T logZ

=T∑t=1

log∑h

exp(−E(h,v(t))

)− T log

∑h,v

exp(−E(h,v))

Binglin Chen RBM for Collaborative Filtering November 29, 2016 8 / 22

Binary RBM Learning, continued

`(θ) =

T∑t=1

log p(v(t))

=

T∑t=1

log∑h

p(h,v(t))

=

T∑t=1

log∑h

1

Zexp

(−E(h,v(t))

)=

T∑t=1

log1

Z

∑h

exp(−E(h,v(t))

)

=T∑t=1

log∑h

exp(−E(h,v(t))

)− T logZ

=T∑t=1

log∑h

exp(−E(h,v(t))

)− T log

∑h,v

exp(−E(h,v))

Binglin Chen RBM for Collaborative Filtering November 29, 2016 8 / 22

Binary RBM Learning, continued

`(θ) =

T∑t=1

log p(v(t))

=

T∑t=1

log∑h

p(h,v(t))

=

T∑t=1

log∑h

1

Zexp

(−E(h,v(t))

)=

T∑t=1

log1

Z

∑h

exp(−E(h,v(t))

)=

T∑t=1

log∑h

exp(−E(h,v(t))

)− T logZ

=T∑t=1

log∑h

exp(−E(h,v(t))

)− T log

∑h,v

exp(−E(h,v))

Binglin Chen RBM for Collaborative Filtering November 29, 2016 8 / 22

Binary RBM Learning, continued

`(θ) =

T∑t=1

log p(v(t))

=

T∑t=1

log∑h

p(h,v(t))

=

T∑t=1

log∑h

1

Zexp

(−E(h,v(t))

)=

T∑t=1

log1

Z

∑h

exp(−E(h,v(t))

)=

T∑t=1

log∑h

exp(−E(h,v(t))

)− T logZ

=

T∑t=1

log∑h

exp(−E(h,v(t))

)− T log

∑h,v

exp(−E(h,v))

Binglin Chen RBM for Collaborative Filtering November 29, 2016 8 / 22

Binary RBM Learning, continued

`(θ) =

T∑t=1

log∑h

exp(−E(h,v

(t)))− T log

∑h,v

exp(−E(h,v))

∂`(θ)

∂β

=

T∑t=1

∂ log∑

h exp(−E(h,v(t))

)∂β

− T∂ log

∑h,v exp(−E(h,v))

∂β

=

T∑t=1

∂∑

h exp(−E(h,v(t))

)∂β∑

h exp(−E(h,v(t))

) − T ∂∑

h,v exp(−E(h,v))

∂β∑h,v exp(−E(h,v))

=

T∑t=1

∑h

∂ exp(−E(h,v(t))

)∂β∑

h exp(−E(h,v(t))

) − T∑h,v∂ exp(−E(h,v))

∂β∑h,v exp(−E(h,v))

=

T∑t=1

∑h exp

(−E(h,v(t))

)∂−E(h,v(t))

∂β∑h exp

(−E(h,v(t))

) − T∑

h,v exp(−E(h,v))∂−E(h,v)

∂β∑h,v exp(−E(h,v))

=

T∑t=1

∑h

exp(−E(h,v(t))

)∑

h exp(−E(h,v(t))

) ∂ − E(h,v(t))

∂β− T

∑h,v

exp(−E(h,v))∑h,v exp(−E(h,v))

∂ − E(h,v)

∂β

=

T∑t=1

∑h

p(h|v(t))∂ − E(h,v(t))

∂β− T

∑h,v

p(h,v)∂ − E(h,v)

∂β

=

T∑t=1

Ep(h|v(t))

[∂ − E(h,v(t))

∂β

]− T Ep(h,v)

[∂ − E(h,v)

∂β

]

Binglin Chen RBM for Collaborative Filtering November 29, 2016 9 / 22

Binary RBM Learning, continued

`(θ) =

T∑t=1

log∑h

exp(−E(h,v

(t)))− T log

∑h,v

exp(−E(h,v))

∂`(θ)

∂β=

T∑t=1

∂ log∑

h exp(−E(h,v(t))

)∂β

− T∂ log

∑h,v exp(−E(h,v))

∂β

=

T∑t=1

∂∑

h exp(−E(h,v(t))

)∂β∑

h exp(−E(h,v(t))

) − T ∂∑

h,v exp(−E(h,v))

∂β∑h,v exp(−E(h,v))

=

T∑t=1

∑h

∂ exp(−E(h,v(t))

)∂β∑

h exp(−E(h,v(t))

) − T∑h,v∂ exp(−E(h,v))

∂β∑h,v exp(−E(h,v))

=

T∑t=1

∑h exp

(−E(h,v(t))

)∂−E(h,v(t))

∂β∑h exp

(−E(h,v(t))

) − T∑

h,v exp(−E(h,v))∂−E(h,v)

∂β∑h,v exp(−E(h,v))

=

T∑t=1

∑h

exp(−E(h,v(t))

)∑

h exp(−E(h,v(t))

) ∂ − E(h,v(t))

∂β− T

∑h,v

exp(−E(h,v))∑h,v exp(−E(h,v))

∂ − E(h,v)

∂β

=

T∑t=1

∑h

p(h|v(t))∂ − E(h,v(t))

∂β− T

∑h,v

p(h,v)∂ − E(h,v)

∂β

=

T∑t=1

Ep(h|v(t))

[∂ − E(h,v(t))

∂β

]− T Ep(h,v)

[∂ − E(h,v)

∂β

]

Binglin Chen RBM for Collaborative Filtering November 29, 2016 9 / 22

Binary RBM Learning, continued

`(θ) =

T∑t=1

log∑h

exp(−E(h,v

(t)))− T log

∑h,v

exp(−E(h,v))

∂`(θ)

∂β=

T∑t=1

∂ log∑

h exp(−E(h,v(t))

)∂β

− T∂ log

∑h,v exp(−E(h,v))

∂β

=

T∑t=1

∂∑

h exp(−E(h,v(t))

)∂β∑

h exp(−E(h,v(t))

) − T ∂∑

h,v exp(−E(h,v))

∂β∑h,v exp(−E(h,v))

=

T∑t=1

∑h

∂ exp(−E(h,v(t))

)∂β∑

h exp(−E(h,v(t))

) − T∑h,v∂ exp(−E(h,v))

∂β∑h,v exp(−E(h,v))

=

T∑t=1

∑h exp

(−E(h,v(t))

)∂−E(h,v(t))

∂β∑h exp

(−E(h,v(t))

) − T∑

h,v exp(−E(h,v))∂−E(h,v)

∂β∑h,v exp(−E(h,v))

=

T∑t=1

∑h

exp(−E(h,v(t))

)∑

h exp(−E(h,v(t))

) ∂ − E(h,v(t))

∂β− T

∑h,v

exp(−E(h,v))∑h,v exp(−E(h,v))

∂ − E(h,v)

∂β

=

T∑t=1

∑h

p(h|v(t))∂ − E(h,v(t))

∂β− T

∑h,v

p(h,v)∂ − E(h,v)

∂β

=

T∑t=1

Ep(h|v(t))

[∂ − E(h,v(t))

∂β

]− T Ep(h,v)

[∂ − E(h,v)

∂β

]

Binglin Chen RBM for Collaborative Filtering November 29, 2016 9 / 22

Binary RBM Learning, continued

`(θ) =

T∑t=1

log∑h

exp(−E(h,v

(t)))− T log

∑h,v

exp(−E(h,v))

∂`(θ)

∂β=

T∑t=1

∂ log∑

h exp(−E(h,v(t))

)∂β

− T∂ log

∑h,v exp(−E(h,v))

∂β

=

T∑t=1

∂∑

h exp(−E(h,v(t))

)∂β∑

h exp(−E(h,v(t))

) − T ∂∑

h,v exp(−E(h,v))

∂β∑h,v exp(−E(h,v))

=

T∑t=1

∑h

∂ exp(−E(h,v(t))

)∂β∑

h exp(−E(h,v(t))

) − T∑h,v∂ exp(−E(h,v))

∂β∑h,v exp(−E(h,v))

=

T∑t=1

∑h exp

(−E(h,v(t))

)∂−E(h,v(t))

∂β∑h exp

(−E(h,v(t))

) − T∑

h,v exp(−E(h,v))∂−E(h,v)

∂β∑h,v exp(−E(h,v))

=

T∑t=1

∑h

exp(−E(h,v(t))

)∑

h exp(−E(h,v(t))

) ∂ − E(h,v(t))

∂β− T

∑h,v

exp(−E(h,v))∑h,v exp(−E(h,v))

∂ − E(h,v)

∂β

=

T∑t=1

∑h

p(h|v(t))∂ − E(h,v(t))

∂β− T

∑h,v

p(h,v)∂ − E(h,v)

∂β

=

T∑t=1

Ep(h|v(t))

[∂ − E(h,v(t))

∂β

]− T Ep(h,v)

[∂ − E(h,v)

∂β

]

Binglin Chen RBM for Collaborative Filtering November 29, 2016 9 / 22

Binary RBM Learning, continued

`(θ) =

T∑t=1

log∑h

exp(−E(h,v

(t)))− T log

∑h,v

exp(−E(h,v))

∂`(θ)

∂β=

T∑t=1

∂ log∑

h exp(−E(h,v(t))

)∂β

− T∂ log

∑h,v exp(−E(h,v))

∂β

=

T∑t=1

∂∑

h exp(−E(h,v(t))

)∂β∑

h exp(−E(h,v(t))

) − T ∂∑

h,v exp(−E(h,v))

∂β∑h,v exp(−E(h,v))

=

T∑t=1

∑h

∂ exp(−E(h,v(t))

)∂β∑

h exp(−E(h,v(t))

) − T∑h,v∂ exp(−E(h,v))

∂β∑h,v exp(−E(h,v))

=

T∑t=1

∑h exp

(−E(h,v(t))

)∂−E(h,v(t))

∂β∑h exp

(−E(h,v(t))

) − T∑

h,v exp(−E(h,v))∂−E(h,v)

∂β∑h,v exp(−E(h,v))

=

T∑t=1

∑h

exp(−E(h,v(t))

)∑

h exp(−E(h,v(t))

) ∂ − E(h,v(t))

∂β− T

∑h,v

exp(−E(h,v))∑h,v exp(−E(h,v))

∂ − E(h,v)

∂β

=

T∑t=1

∑h

p(h|v(t))∂ − E(h,v(t))

∂β− T

∑h,v

p(h,v)∂ − E(h,v)

∂β

=

T∑t=1

Ep(h|v(t))

[∂ − E(h,v(t))

∂β

]− T Ep(h,v)

[∂ − E(h,v)

∂β

]

Binglin Chen RBM for Collaborative Filtering November 29, 2016 9 / 22

Binary RBM Learning, continued

`(θ) =

T∑t=1

log∑h

exp(−E(h,v

(t)))− T log

∑h,v

exp(−E(h,v))

∂`(θ)

∂β=

T∑t=1

∂ log∑

h exp(−E(h,v(t))

)∂β

− T∂ log

∑h,v exp(−E(h,v))

∂β

=

T∑t=1

∂∑

h exp(−E(h,v(t))

)∂β∑

h exp(−E(h,v(t))

) − T ∂∑

h,v exp(−E(h,v))

∂β∑h,v exp(−E(h,v))

=

T∑t=1

∑h

∂ exp(−E(h,v(t))

)∂β∑

h exp(−E(h,v(t))

) − T∑h,v∂ exp(−E(h,v))

∂β∑h,v exp(−E(h,v))

=

T∑t=1

∑h exp

(−E(h,v(t))

)∂−E(h,v(t))

∂β∑h exp

(−E(h,v(t))

) − T∑

h,v exp(−E(h,v))∂−E(h,v)

∂β∑h,v exp(−E(h,v))

=

T∑t=1

∑h

exp(−E(h,v(t))

)∑

h exp(−E(h,v(t))

) ∂ − E(h,v(t))

∂β− T

∑h,v

exp(−E(h,v))∑h,v exp(−E(h,v))

∂ − E(h,v)

∂β

=

T∑t=1

∑h

p(h|v(t))∂ − E(h,v(t))

∂β− T

∑h,v

p(h,v)∂ − E(h,v)

∂β

=

T∑t=1

Ep(h|v(t))

[∂ − E(h,v(t))

∂β

]− T Ep(h,v)

[∂ − E(h,v)

∂β

]

Binglin Chen RBM for Collaborative Filtering November 29, 2016 9 / 22

Binary RBM Learning, continued

`(θ) =

T∑t=1

log∑h

exp(−E(h,v

(t)))− T log

∑h,v

exp(−E(h,v))

∂`(θ)

∂β=

T∑t=1

∂ log∑

h exp(−E(h,v(t))

)∂β

− T∂ log

∑h,v exp(−E(h,v))

∂β

=

T∑t=1

∂∑

h exp(−E(h,v(t))

)∂β∑

h exp(−E(h,v(t))

) − T ∂∑

h,v exp(−E(h,v))

∂β∑h,v exp(−E(h,v))

=

T∑t=1

∑h

∂ exp(−E(h,v(t))

)∂β∑

h exp(−E(h,v(t))

) − T∑h,v∂ exp(−E(h,v))

∂β∑h,v exp(−E(h,v))

=

T∑t=1

∑h exp

(−E(h,v(t))

)∂−E(h,v(t))

∂β∑h exp

(−E(h,v(t))

) − T∑

h,v exp(−E(h,v))∂−E(h,v)

∂β∑h,v exp(−E(h,v))

=

T∑t=1

∑h

exp(−E(h,v(t))

)∑

h exp(−E(h,v(t))

) ∂ − E(h,v(t))

∂β− T

∑h,v

exp(−E(h,v))∑h,v exp(−E(h,v))

∂ − E(h,v)

∂β

=

T∑t=1

∑h

p(h|v(t))∂ − E(h,v(t))

∂β− T

∑h,v

p(h,v)∂ − E(h,v)

∂β

=

T∑t=1

Ep(h|v(t))

[∂ − E(h,v(t))

∂β

]− T Ep(h,v)

[∂ − E(h,v)

∂β

]

Binglin Chen RBM for Collaborative Filtering November 29, 2016 9 / 22

Binary RBM Learning, continued

`(θ) =

T∑t=1

log∑h

exp(−E(h,v

(t)))− T log

∑h,v

exp(−E(h,v))

∂`(θ)

∂β=

T∑t=1

∂ log∑

h exp(−E(h,v(t))

)∂β

− T∂ log

∑h,v exp(−E(h,v))

∂β

=

T∑t=1

∂∑

h exp(−E(h,v(t))

)∂β∑

h exp(−E(h,v(t))

) − T ∂∑

h,v exp(−E(h,v))

∂β∑h,v exp(−E(h,v))

=

T∑t=1

∑h

∂ exp(−E(h,v(t))

)∂β∑

h exp(−E(h,v(t))

) − T∑h,v∂ exp(−E(h,v))

∂β∑h,v exp(−E(h,v))

=

T∑t=1

∑h exp

(−E(h,v(t))

)∂−E(h,v(t))

∂β∑h exp

(−E(h,v(t))

) − T∑

h,v exp(−E(h,v))∂−E(h,v)

∂β∑h,v exp(−E(h,v))

=

T∑t=1

∑h

exp(−E(h,v(t))

)∑

h exp(−E(h,v(t))

) ∂ − E(h,v(t))

∂β− T

∑h,v

exp(−E(h,v))∑h,v exp(−E(h,v))

∂ − E(h,v)

∂β

=

T∑t=1

∑h

p(h|v(t))∂ − E(h,v(t))

∂β− T

∑h,v

p(h,v)∂ − E(h,v)

∂β

=

T∑t=1

Ep(h|v(t))

[∂ − E(h,v(t))

∂β

]− T Ep(h,v)

[∂ − E(h,v)

∂β

]Binglin Chen RBM for Collaborative Filtering November 29, 2016 9 / 22

Binary RBM Summary

A two layer RBM can be fully characterized by the following:

E(h,v)p(h,v)Zp(hi = s|v)p(vj = t|h)

Binglin Chen RBM for Collaborative Filtering November 29, 2016 10 / 22

Towards RBM for CF, First Step: Make Visible Units K-nary

v1,1 v1,2 v1,3 v1,K v2,1 vN,K

h1 h2 h3 hM

visible layer

hidden layer

v ∈ {0, 1}NK

h ∈ {0, 1}M

W1,1,1 WM,N,K W ∈ RM×NK

b1,1 b1,2 b1,3 b1,K b2,1 bN,K

a1 a2 a3 aM

b ∈ RNK

a ∈ RM

vi,: Valid Assignment?00001

01000√

11010 ×00000 ×

Binglin Chen RBM for Collaborative Filtering November 29, 2016 11 / 22

Towards RBM for CF, First Step: Make Visible Units K-nary

v1,1 v1,2 v1,3 v1,K v2,1 vN,K

h1 h2 h3 hM

visible layer

hidden layer

v ∈ {0, 1}NK

h ∈ {0, 1}M

W1,1,1 WM,N,K W ∈ RM×NK

b1,1 b1,2 b1,3 b1,K b2,1 bN,K

a1 a2 a3 aM

b ∈ RNK

a ∈ RM

vi,: Valid Assignment?00001

01000√

11010 ×00000 ×

Binglin Chen RBM for Collaborative Filtering November 29, 2016 11 / 22

Towards RBM for CF, First Step: Make Visible Units K-nary

v1,1 v1,2 v1,3 v1,K v2,1 vN,K

h1 h2 h3 hM

visible layer

hidden layer

v ∈ {0, 1}NK

h ∈ {0, 1}M

W1,1,1 WM,N,K W ∈ RM×NK

b1,1 b1,2 b1,3 b1,K b2,1 bN,K

a1 a2 a3 aM

b ∈ RNK

a ∈ RM

vi,: Valid Assignment?00001

01000√

11010 ×00000 ×

Binglin Chen RBM for Collaborative Filtering November 29, 2016 11 / 22

Towards RBM for CF, First Step: Make Visible Units K-nary

v1,1 v1,2 v1,3 v1,K v2,1 vN,K

h1 h2 h3 hM

visible layer

hidden layer

v ∈ {0, 1}NK

h ∈ {0, 1}M

W1,1,1 WM,N,K W ∈ RM×NK

b1,1 b1,2 b1,3 b1,K b2,1 bN,K

a1 a2 a3 aM

b ∈ RNK

a ∈ RM

vi,: Valid Assignment?00001

01000√

11010 ×00000 ×

Binglin Chen RBM for Collaborative Filtering November 29, 2016 11 / 22

Towards RBM for CF, First Step: Make Visible Units K-nary, continued

Binary K-nary

E(h,v) −aTh− bTv − hTWv −aTh− bTv − hTWv K-nary requires valid v.

p(h,v)1

Zexp(−E(h,v))

1

Zexp(−E(h,v)) K-nary requires valid v.

Z∑

h,v exp(−E(h,v))∑

h,v exp(−E(h,v)) K-nary requires valid v.

p(hi = 1|v) sigmoid (ai +W i,:v) sigmoid (ai +W i,:v) K-nary requires valid v.

p(vj = 1|h) sigmoid(bj + hTW :,j

)Undefined for K-nary.

p(vj,k = 1|h)exp

(bj,k + hTW :,j,k

)∑Kl=1 exp

(bj,l + hTW :,j,l

) Undefined for Binary.

Binglin Chen RBM for Collaborative Filtering November 29, 2016 12 / 22

Towards RBM for CF, First Step: Make Visible Units K-nary, continued

Binary K-nary

E(h,v) −aTh− bTv − hTWv −aTh− bTv − hTWv K-nary requires valid v.

p(h,v)1

Zexp(−E(h,v))

1

Zexp(−E(h,v)) K-nary requires valid v.

Z∑

h,v exp(−E(h,v))∑

h,v exp(−E(h,v)) K-nary requires valid v.

p(hi = 1|v) sigmoid (ai +W i,:v) sigmoid (ai +W i,:v) K-nary requires valid v.

p(vj = 1|h) sigmoid(bj + hTW :,j

)Undefined for K-nary.

p(vj,k = 1|h)exp

(bj,k + hTW :,j,k

)∑Kl=1 exp

(bj,l + hTW :,j,l

) Undefined for Binary.

Binglin Chen RBM for Collaborative Filtering November 29, 2016 12 / 22

Towards RBM for CF, First Step: Make Visible Units K-nary, continued

Binary K-nary

E(h,v) −aTh− bTv − hTWv −aTh− bTv − hTWv K-nary requires valid v.

p(h,v)1

Zexp(−E(h,v))

1

Zexp(−E(h,v)) K-nary requires valid v.

Z∑

h,v exp(−E(h,v))∑

h,v exp(−E(h,v)) K-nary requires valid v.

p(hi = 1|v) sigmoid (ai +W i,:v) sigmoid (ai +W i,:v) K-nary requires valid v.

p(vj = 1|h) sigmoid(bj + hTW :,j

)Undefined for K-nary.

p(vj,k = 1|h)exp

(bj,k + hTW :,j,k

)∑Kl=1 exp

(bj,l + hTW :,j,l

) Undefined for Binary.

Binglin Chen RBM for Collaborative Filtering November 29, 2016 12 / 22

Towards RBM for CF, First Step: Make Visible Units K-nary, continued

Binary K-nary

E(h,v) −aTh− bTv − hTWv −aTh− bTv − hTWv K-nary requires valid v.

p(h,v)1

Zexp(−E(h,v))

1

Zexp(−E(h,v)) K-nary requires valid v.

Z∑

h,v exp(−E(h,v))∑

h,v exp(−E(h,v)) K-nary requires valid v.

p(hi = 1|v) sigmoid (ai +W i,:v) sigmoid (ai +W i,:v) K-nary requires valid v.

p(vj = 1|h) sigmoid(bj + hTW :,j

)Undefined for K-nary.

p(vj,k = 1|h)exp

(bj,k + hTW :,j,k

)∑Kl=1 exp

(bj,l + hTW :,j,l

) Undefined for Binary.

Binglin Chen RBM for Collaborative Filtering November 29, 2016 12 / 22

Towards RBM for CF, First Step: Make Visible Units K-nary, continued

Binary K-nary

E(h,v) −aTh− bTv − hTWv −aTh− bTv − hTWv K-nary requires valid v.

p(h,v)1

Zexp(−E(h,v))

1

Zexp(−E(h,v)) K-nary requires valid v.

Z∑

h,v exp(−E(h,v))∑

h,v exp(−E(h,v)) K-nary requires valid v.

p(hi = 1|v) sigmoid (ai +W i,:v) sigmoid (ai +W i,:v) K-nary requires valid v.

p(vj = 1|h) sigmoid(bj + hTW :,j

)Undefined for K-nary.

p(vj,k = 1|h)exp

(bj,k + hTW :,j,k

)∑Kl=1 exp

(bj,l + hTW :,j,l

) Undefined for Binary.

Binglin Chen RBM for Collaborative Filtering November 29, 2016 12 / 22

Towards RBM for CF, First Step: Make Visible Units K-nary, continued

Binary K-nary

E(h,v) −aTh− bTv − hTWv −aTh− bTv − hTWv K-nary requires valid v.

p(h,v)1

Zexp(−E(h,v))

1

Zexp(−E(h,v)) K-nary requires valid v.

Z∑

h,v exp(−E(h,v))∑

h,v exp(−E(h,v)) K-nary requires valid v.

p(hi = 1|v) sigmoid (ai +W i,:v) sigmoid (ai +W i,:v) K-nary requires valid v.

p(vj = 1|h) sigmoid(bj + hTW :,j

)Undefined for K-nary.

p(vj,k = 1|h)exp

(bj,k + hTW :,j,k

)∑Kl=1 exp

(bj,l + hTW :,j,l

) Undefined for Binary.

Binglin Chen RBM for Collaborative Filtering November 29, 2016 12 / 22

Towards RBM for CF, First Step: Make Visible Units K-nary, continued

Binary K-nary

E(h,v) −aTh− bTv − hTWv −aTh− bTv − hTWv K-nary requires valid v.

p(h,v)1

Zexp(−E(h,v))

1

Zexp(−E(h,v)) K-nary requires valid v.

Z∑

h,v exp(−E(h,v))∑

h,v exp(−E(h,v)) K-nary requires valid v.

p(hi = 1|v) sigmoid (ai +W i,:v) sigmoid (ai +W i,:v) K-nary requires valid v.

p(vj = 1|h) sigmoid(bj + hTW :,j

)Undefined for K-nary.

p(vj,k = 1|h)exp

(bj,k + hTW :,j,k

)∑Kl=1 exp

(bj,l + hTW :,j,l

) Undefined for Binary.

Binglin Chen RBM for Collaborative Filtering November 29, 2016 12 / 22

Towards RBM for CF, Second Step: Handle Missing Values

How to handle missing values?

Imputation

Fill in the blanks with some estimation based on available information.

Parameter Sharing

Use RBM with different numbers of visible units for differenttraining/test cases.

Binglin Chen RBM for Collaborative Filtering November 29, 2016 13 / 22

Towards RBM for CF, Second Step: Handle Missing Values

How to handle missing values?

Imputation

Fill in the blanks with some estimation based on available information.

Parameter Sharing

Use RBM with different numbers of visible units for differenttraining/test cases.

Binglin Chen RBM for Collaborative Filtering November 29, 2016 13 / 22

Towards RBM for CF, Second Step: Handle Missing Values

How to handle missing values?

Imputation

Fill in the blanks with some estimation based on available information.

Parameter Sharing

Use RBM with different numbers of visible units for differenttraining/test cases.

Binglin Chen RBM for Collaborative Filtering November 29, 2016 13 / 22

Towards RBM for CF, Second Step: Handle Missing Values, continued

Suppose K = 3 for the following case:

m1 m2 m3u1 1 ? ?u2 2 ? 2u3 1 3 ?

v v Visible Unitsu1 100 ??? ??? 100 v1,1, v1,2, v1,3u2 010 ??? 010 010 010 v1,1, v1,2, v1,3, v3,1, v3,2, v3,3u3 100 001 ??? 100 001 v1,1, v1,2, v1,3, v2,1, v2,2, v2,3

Binglin Chen RBM for Collaborative Filtering November 29, 2016 14 / 22

Towards RBM for CF, Second Step: Handle Missing Values, continued

Suppose K = 3 for the following case:

m1 m2 m3u1 1 ? ?u2 2 ? 2u3 1 3 ?

v v Visible Unitsu1 100 ??? ??? 100 v1,1, v1,2, v1,3u2 010 ??? 010 010 010 v1,1, v1,2, v1,3, v3,1, v3,2, v3,3u3 100 001 ??? 100 001 v1,1, v1,2, v1,3, v2,1, v2,2, v2,3

Binglin Chen RBM for Collaborative Filtering November 29, 2016 14 / 22

Towards RBM for CF, Third Step: Make Predictions

p(vq,k = 1|v)

=∑h

p(h, v, vq,k)

∝∑h

exp(−E(h, v, vq,k))

Can be computed in polynomial time.

p(vq1,k1 = 1, vq2,k2 = 1, . . . , vqS ,kS = 1|v)

However, query of S movies on a user requires KS evaluations.

Binglin Chen RBM for Collaborative Filtering November 29, 2016 15 / 22

Towards RBM for CF, Third Step: Make Predictions

p(vq,k = 1|v) =∑h

p(h, v, vq,k)

∝∑h

exp(−E(h, v, vq,k))

Can be computed in polynomial time.

p(vq1,k1 = 1, vq2,k2 = 1, . . . , vqS ,kS = 1|v)

However, query of S movies on a user requires KS evaluations.

Binglin Chen RBM for Collaborative Filtering November 29, 2016 15 / 22

Towards RBM for CF, Third Step: Make Predictions

p(vq,k = 1|v) =∑h

p(h, v, vq,k)

∝∑h

exp(−E(h, v, vq,k))

Can be computed in polynomial time.

p(vq1,k1 = 1, vq2,k2 = 1, . . . , vqS ,kS = 1|v)

However, query of S movies on a user requires KS evaluations.

Binglin Chen RBM for Collaborative Filtering November 29, 2016 15 / 22

Towards RBM for CF, Third Step: Make Predictions

p(vq,k = 1|v) =∑h

p(h, v, vq,k)

∝∑h

exp(−E(h, v, vq,k))

Can be computed in polynomial time.

p(vq1,k1 = 1, vq2,k2 = 1, . . . , vqS ,kS = 1|v)

However, query of S movies on a user requires KS evaluations.

Binglin Chen RBM for Collaborative Filtering November 29, 2016 15 / 22

Towards RBM for CF, Third Step: Make Predictions

p(vq,k = 1|v) =∑h

p(h, v, vq,k)

∝∑h

exp(−E(h, v, vq,k))

Can be computed in polynomial time.

p(vq1,k1 = 1, vq2,k2 = 1, . . . , vqS ,kS = 1|v)

However, query of S movies on a user requires KS evaluations.

Binglin Chen RBM for Collaborative Filtering November 29, 2016 15 / 22

Towards RBM for CF, Third Step: Make Predictions

p(vq,k = 1|v) =∑h

p(h, v, vq,k)

∝∑h

exp(−E(h, v, vq,k))

Can be computed in polynomial time.

p(vq1,k1 = 1, vq2,k2 = 1, . . . , vqS ,kS = 1|v)

However, query of S movies on a user requires KS evaluations.

Binglin Chen RBM for Collaborative Filtering November 29, 2016 15 / 22

Towards RBM for CF, Third Step: Make Predictions, continued

hi = p(hi = 1|v) = sigmoid(ai + W i,:v

)

p(vq,k = 1|h) =exp

(bq,k + h

TW :,q,k

)∑K

l=1 exp(bq,l + h

TW :,q,l

)

A lot faster but slightly less accurate.

Binglin Chen RBM for Collaborative Filtering November 29, 2016 16 / 22

Towards RBM for CF, Third Step: Make Predictions, continued

hi = p(hi = 1|v) = sigmoid(ai + W i,:v

)

p(vq,k = 1|h) =exp

(bq,k + h

TW :,q,k

)∑K

l=1 exp(bq,l + h

TW :,q,l

)

A lot faster but slightly less accurate.

Binglin Chen RBM for Collaborative Filtering November 29, 2016 16 / 22

Towards RBM for CF, Third Step: Make Predictions, continued

hi = p(hi = 1|v) = sigmoid(ai + W i,:v

)

p(vq,k = 1|h) =exp

(bq,k + h

TW :,q,k

)∑K

l=1 exp(bq,l + h

TW :,q,l

)

A lot faster but slightly less accurate.

Binglin Chen RBM for Collaborative Filtering November 29, 2016 16 / 22

Extension 1: Gaussian Hidden Units

K-nary K-nary with Gaussian

E(h,v) −aTh− bTv − hTWvM∑i=1

(hi − ai)2

2σi2− bTv − hTWv

p(h,v)1

Zexp(−E(h,v))

1

Zexp(−E(h,v))

Z∑

h,v exp(−E(h,v))∑

h,v exp(−E(h,v))

p(vj,k = 1|h)exp

(bj,k + hTW :,j,k

)∑Kl=1 exp

(bj,l + hTW :,j,l

) exp(bj,k + hTW :,j,k

)∑Kl=1 exp

(bj,l + hTW :,j,l

)p(hi = 1|v) sigmoid (ai +W i,:v)

p(hi = h|v) 1√2πσi

exp

(−(h− ai − σiW i,:v)

2

2σ2i

)

Binglin Chen RBM for Collaborative Filtering November 29, 2016 17 / 22

Extension 2: Condition On What Users Have Watched

v1,1 v1,K v2,1 vN,K

h1 h2 h3 hM

visible layer

hidden layer

v ∈ {0, 1}NK

h ∈ {0, 1}M

WM,N,K W ∈ RM×NK

b1,1 b1,K b2,1 bN,K

a1 a2 a3 aM

b ∈ RNK

a ∈ RM

r ∈ {0, 1}Nr1 r2 rN

U ∈ RM×NU1,1

E(h,v, r) = −aTh− bTv − hTWv − hTWr

p(vj,k = 1|h) =exp

(bj,k + hTW :,j,k

)∑Kl=1 exp

(bj,l + hTW :,j,l

)p(hi = 1|v, r) = sigmoid (ai +W i,:v +U i,:r)

Binglin Chen RBM for Collaborative Filtering November 29, 2016 18 / 22

Extension 2: Condition On What Users Have Watched

v1,1 v1,K v2,1 vN,K

h1 h2 h3 hM

visible layer

hidden layer

v ∈ {0, 1}NK

h ∈ {0, 1}M

WM,N,K W ∈ RM×NK

b1,1 b1,K b2,1 bN,K

a1 a2 a3 aM

b ∈ RNK

a ∈ RM

r ∈ {0, 1}Nr1 r2 rN

U ∈ RM×NU1,1

E(h,v, r) = −aTh− bTv − hTWv − hTWr

p(vj,k = 1|h) =exp

(bj,k + hTW :,j,k

)∑Kl=1 exp

(bj,l + hTW :,j,l

)p(hi = 1|v, r) = sigmoid (ai +W i,:v +U i,:r)

Binglin Chen RBM for Collaborative Filtering November 29, 2016 18 / 22

Extension 2: Condition On What Users Have Watched

v1,1 v1,K v2,1 vN,K

h1 h2 h3 hM

visible layer

hidden layer

v ∈ {0, 1}NK

h ∈ {0, 1}M

WM,N,K W ∈ RM×NK

b1,1 b1,K b2,1 bN,K

a1 a2 a3 aM

b ∈ RNK

a ∈ RM

r ∈ {0, 1}Nr1 r2 rN

U ∈ RM×NU1,1

E(h,v, r) = −aTh− bTv − hTWv − hTWr

p(vj,k = 1|h) =exp

(bj,k + hTW :,j,k

)∑Kl=1 exp

(bj,l + hTW :,j,l

)p(hi = 1|v, r) = sigmoid (ai +W i,:v +U i,:r)

Binglin Chen RBM for Collaborative Filtering November 29, 2016 18 / 22

Extension 3: Factorize W

W ∈ RM×NK

100× 17, 770× 5 = 8, 885, 000

Factorize W into PQ

P ∈ RM×C , Q ∈ RC×NK

C �M and C � NK

100× 30 + 30× 17, 770× 5 = 2, 668, 500

Binglin Chen RBM for Collaborative Filtering November 29, 2016 19 / 22

Extension 3: Factorize W

W ∈ RM×NK

100× 17, 770× 5 = 8, 885, 000

Factorize W into PQ

P ∈ RM×C , Q ∈ RC×NK

C �M and C � NK

100× 30 + 30× 17, 770× 5 = 2, 668, 500

Binglin Chen RBM for Collaborative Filtering November 29, 2016 19 / 22

Extension 3: Factorize W

W ∈ RM×NK

100× 17, 770× 5 = 8, 885, 000

Factorize W into PQ

P ∈ RM×C , Q ∈ RC×NK

C �M and C � NK

100× 30 + 30× 17, 770× 5 = 2, 668, 500

Binglin Chen RBM for Collaborative Filtering November 29, 2016 19 / 22

Extension 3: Factorize W

W ∈ RM×NK

100× 17, 770× 5 = 8, 885, 000

Factorize W into PQ

P ∈ RM×C , Q ∈ RC×NK

C �M and C � NK

100× 30 + 30× 17, 770× 5 = 2, 668, 500

Binglin Chen RBM for Collaborative Filtering November 29, 2016 19 / 22

Experimental Results

Binglin Chen RBM for Collaborative Filtering November 29, 2016 20 / 22

Questions?

Binglin Chen RBM for Collaborative Filtering November 29, 2016 21 / 22

Thank you!

Binglin Chen RBM for Collaborative Filtering November 29, 2016 22 / 22