Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing in Bayesian Topic...

Preview:

Citation preview

----- Steering -----Time-Dependent Estimation--- of Posteriors ---with HYperparameter Indexing- in Bayesian Topic Models -

Tomonari MASADA (正田备也 )

Nagasaki Universitymasada@nagasaki-u.ac.jp

OUTLINE(1/3)• Aim–Improve LDA [Blei et al. 03]

in terms of perplexityby using document timestamps

e.g. SNS documents are timestamped.e.g. Facebook, Twitter, Weibo, ...

OUTLINE(2/3)• Our approach–Prepare a word multinomial

for each timestamp• LDA : K word multinomials

• (Ours) : T x K word multinomials

Topic distributions vary along time.(Increase # basis coefficient vectors)

topic = word multinomial

(Increase # basis vectors)

Word distributions vary along time.

OUTLINE(3/3)• Problem–Overfitting• T x K x W word multinomial params

• Proposal–Hyperparameter indexing

φ1 φK

Multi(φ1), Multi(φ2), ... , Multi(φK)

φk=(φk1, φk2, ..., φkW)

LDALDA

φ1 φK

Di(β)

β=(β1, β2, ..., βW)

LDALDA

φ11 φ1K

φTKφT1

φ11 φ1K

φTKφT1

Di(β)

β=(β1, β2, ..., βW)

Option 0Option 0

φ11 φ1K

φTKφT1

Option 1Option 1

Di(β1) . . . Di(βK)

β=(βk1, βk2, ..., βkW)

φ11 φ1K

φTKφT1

Option 2Option 2

Di(β1)...Di(βT)

β=(βt1, βt2, ..., βtW)

φ11 φ1K

φTKφT1

Option 3Option 3

Di(β11) . . . Di(β1K). . .. . .. . .Di(βT1) . . . Di(βTK)β=(βtk1, βtk2, ..., βtkW)

LDA

Option 1

Option 3

PROPOSAL

----- Steering -----

Time-Dependent Estimation

--- of Posteriors ---

with HYperparameter Indexing

- in Bayesian Topic Models -

ST E

PHY

• VB for–Time Independent Model

• VB for–Slightly Time Dependent Model

• VB for–Heavily Time Dependent Model

S T E P H YLDA

Option 1

Option 3

x 50 iters

x 140 iters

x 10 iters

w kwkwk jkjkjwk exp

w

jwkjwkjk n j

jwkjwwkw n

w kwtkwtk jkjkjwk jj exp

w

jwkjwkjk n

ttj

jwkjwkwtkw

j

n:

kwtkw

kwtkw

w kwtkwtk jkjkjwk jj exp

w

jwkjwkjk n

ttj

jwkjwtkwtkw

j

n:

LDA

Option 1

Option 3

wkw

STEPHY• Conduct Multistage Inference

Over Different Topic Models

Having Compatible Parameters

DATA SPECSJ W T P

NIPS 1,740 11,998 13 919,916

DBLP 1,235,988 273,173 20 7,814,175

DONGA 24,093 71,621 53 7,949,288

TDT 96,256 51,849 123 11,460,231

NSF 128,181 25,325 13 10,388,976

YOMI 367,910 84,060 52 32,762,456

COMPLEXITY• Time: O(PK)

P = #(diff doc-word pairs)

• Space: O(QK) Q = #(diff timestamp-word pairs)

–No malloc for

–Malloc for jwk

twk

IMPLIMENTATION

• VB

–Realm of embarrassing parallelism

•OpenMP

[Wang et al. 06]

• CGS for• VB for–Time Independent Model

• VB for–Slightly Time Dependent Model

• VB for–Heavily Time Dependent Model

LDA

Option 1

Option 3

x 1000 iters

x 50 iters

x 5 iters

NEW RESULTS

x 50 itersLDA

CONCLUSIONSTEPHY–Conduct Multistage Inference

Over Different Topic ModelsHaving Compatible Parameters.

–Can efficiently improve LDAin terms of test set perplexity.

FUTURE WORK• Other types of mixture models– topic = Gaussian

• Bayesian nonparametrics– Topic distributions are left intact.

• Practical evaluatione.g. Classification, Clustering, Topic detection, IR, ...

Recommended