Upload
brandon-warren
View
222
Download
2
Tags:
Embed Size (px)
Citation preview
Online Learning for Latent Dirichlet Allocation
Matthew D. Hoffman, David M. Blei and Francis Bach
NIPS 2010
Presented by Lingbo Li
Latent Dirichlet Allocation (LDA)
1) Draw each topic2) For each document:
1) Draw topic proportions2) For each word:
1) Draw2) Draw
Batch variational Bayes for LDA
For a collection of documents, infer:• Per-word topic assignment• Per-document topic proportion • topic distributions
True posterior is approximated by
Optimize over the variational parameters
Analysis of convergence
• Multiply the gradients by the inverse of an appropriate positive definite matrix H to speed up stochastic gradient algorithms.
• H: the Fisher information matrix of the variational distribution q
Experiments
Use perplexity on held-out data as a measure of model:
• are fit using the E step in algorithm 2;• •
• Two corpora: 352,549 documents from the journal Nature, and 100,000 documents from the English version Wikipedia.
• For each corpus, set aside a 1,000-document test set and a separate 1,000-document validation set.
• Run online LDA for five hours on the remaining documents from each corpus for
Evaluating learning parameters