Upload
hanae-french
View
25
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Nonparametric Bayes Pachinko Allocation by Li, Blei and McCallum (UAI 2007). Presented by Lihan He ECE, Duke University March 3rd, 2008. Outlines. Reviews on Topic Models (LDA, CTM) Pachinko Allocation (PAM) Nonparametric Pachinko Allocation Experimental Results Conclusions. - PowerPoint PPT Presentation
Citation preview
Nonparametric Bayes Pachinko Allocationby
Li, Blei and McCallum (UAI 2007)
Presented by Lihan He
ECE, Duke University
March 3rd, 2008
Reviews on Topic Models (LDA, CTM)
Pachinko Allocation (PAM)
Nonparametric Pachinko Allocation
Experimental Results
Conclusions
Outlines
Notation and terminology
• Word: the basic unit from a vocabulary of size V (includes V distinct words). The vth word is represented by
• Document: a sequence of N words.
• Corpus: a collection of M documents.
• Topic: a multinomial distribution over words.
T
V
vthw
dim
]00100[
],,,[ 21 NwwwW
},,,{ 21 MWWWD
• The words in a document are exchangeable;
• Documents are also exchangeable.
Assumptions:
Reviews on Topic Models – Notation
βα, fixed unknown parameters
Random variables (w are observable)
fixed known parameterskVNM ,,,
Generative process for each document W in a corpus D:
1. Choose
2. For each of the N words in the document W
(a) Choose a topic
(b) Choose a word
dimareand),(~ kαθαDirichletθ
nw)(~ lMultinomiazn
matrixais),(~ VklMultinomiawnzn ββ
θ
)1|1( ijij zwp
Reviews on Topic Models - Latent Dirichlet Allocation (LDA)
z
wN
M
wz,,
is a document-level variable, z and w are word-level variables.
Limitations:
1. Because of the independence assumption implicit in the Dirichlet distribution, LDA is unable to capture the correlation between different topics.
2. Manually select the number of topics k.
)(~ αDirichletθ
,1
)1(][
0020
jijiCov
k
ii
10
0 is usually very large for the posterior
Reviews on Topic Models - Latent Dirichlet Allocation (LDA)
Generative process for each document W in a corpus D:
1. Choose
2. For each of the N words in the document W
(a) Choose a topic
(b) Choose a word
nw))((~ ηflMultinomiazn
matrixais),(~ VklMultinomiawnzn ββ
)}log(exp{)|(1
k
i
T iezηηzp
Reviews on Topic Models - Correlated Topic Models (CTM)
j
ii j
i
e
ef
)(
Key point: the topic proportions are drawn from a logistic normal distribution rather than a Dirichlet distribution.
θ
z
wN
M
),(~ kN
Limitations:
1. Limited to pair-wise correlations between topics, and the number of parameters in the covariance matrix grows as the square of the number of topics.
2. Manually select the number of topics k.
]})[log(])[log(exp{),|( 121 μ
θμ
θΣμθf
jj
T
jj
Σ
Reviews on Topic Models - Correlated Topic Models (CTM)
Pachinko Allocation Model (PAM)
In PAM, the concept of topics are extended to be distributions not only over words (as in LDA and CTM), but also over other topics.
The structure of PAM is extremely flexible.
Pachinko: a Japanese game, in which metal balls bounce down around a complex collection of pins until they land in various bins at the bottom.
Four-level PAMt
t
tz
wN
M
r
r
rz
Sβ,tr ,α
SkVNM ,,,,
wzz trtr ,,,,
fixed known parameters
fixed unknown parameters
random variables
Generative process for each document W in a corpus D:
1. Choose
2. For each of the S super-topics, choose
3. For each of the N words in the document W
(a) Choose a super-topic
(b) Choose a sub-topic
(c) Choose a word
dimareand),(~ SαθαDirichletθ rrrr
)(~ )( rztt lMultinomiaz
matrixais),(~ VklMultinomiawtzn ββ
)(~ rr lMultinomiaz
r mixting weights for super-topic
t mixting weights for sub-topic
dimareand),(~ kαθαDirichletθ tttt
nw
Pachinko Allocation Model (PAM)
root
super-topic
sub-topic
word
Advantage:
Capture correlations between topics by a super-topic layer.
Limitation:
Manually select the number of super-topics S and the number of sub-topics k.
Pachinko Allocation Model (PAM)
Nonparametric Pachinko Allocation
Assumes an HDP-based prior for PAM
Based on a 5-level hierarchical Chinese restaurant process
Automatically decides the super-topic number S and the sub-topic number k
Chinese restaurant process:
P (a new customer sits at an occupied table t)
P (a new customer sits at an unoccupied table)
')'(
)(
ttC
tC
')'(
ttC
denoted as ).,)}(({ ttCCRP
Nonparametric Pachinko Allocation
root
super-topic
sub-topic
word customer
restaurant
category
dish
Notation:
There are infinite numbers of super-topic and sub-topic.
Both super-topic (category) and sub-topic (dish) are globally shared among all documents.
Sampling for super-topics involves two-level CRP.
Sampling for sub-topics involves three-level CRP.
Nonparametric Pachinko Allocation
Generative process:
A customer x arrives at restaurant rj
1. He chooses the kth entryway ejk in the restaurant from2. If ejk is a new entryway, a category cl is associated to it from3. After choosing the category, the customer makes the decision for which table he will
sit at. He chooses table tjln from 4. If the customer sits at an existing table, he will share the menu and dish with other
customers at the same table. Otherwise, he will choose a menu mlp for the new table from
5. If the customer gets an existing menu, he will eat the dish on the menu. Otherwise, he samples dish dm for the new menu from
),)},(({ 0kkjCCRP
),})',(({ 0'lj
jlCCRP
),)},,(({ 1nnljCCRP
),}),,'(({ 1'pj
pljCCRP
),}),'(({ 1'ml
mlCCRP
Nonparametric Pachinko Allocation
Graphical Model
Model parameters: scalars and base H. Two-level clustering of indicator variables, with first level clustering
using 2-layer CRP and second level clustering using 3-layer CRP. Atoms are all drawn from base H.
11010 ,,,,
NM
Experimental Results
Datasets:
20 newsgroup comp5 dataset: 5 different newsgroups, 4,836 documents, including 468,252 words and 35,567 unique words.
Rexa dataset: digital library of computer science. Randomly choose 5,000 documents, including 350,760 words and 25,597 unique words.
NIPS dataset: 1,647 abstracts of NIPS paper from 1987-1999, including 114,142 words and 11,708 unique words.
Likelihood Comparison:
Experimental Results
Topic Examples
20 newsgroup comp5 dataset
Experimental Results
Topic Examples
NIPS dataset
Nonparametric Bayes PAM discovers the sparse structure.
Conclusions
A nonparametric Bayesian prior for pachinko allocation is presented based on a variant of the hierarchical Dirichlet process;
Nonparametric PAM automatically discovers topic correlations as well as determining the numbers of topics at different levels;
The topic structure discovered by nonparametric PAM is usually sparse.
Appendix: Hierarchical Latent Dirichlet Allocation (hLDA)
z
wN
M
Key difference from LDA:
Topics are organized as an L-level tree structure, instead of a kxV matrix.
L is prespecified manually.
β
Generative process for each document W in a corpus D:
1. Choose a path from the root of the topic tree to a leaf. The path includes L topics.
2. Choose
3. For each of the N words in the document W
(a) Choose a topic
(b) Choose a word is a V-dim vector, which is the multinomial parameter for the znth topic along the path from root to leaf, chosen by step 1.
dimareand),(~ LαθαDirichletθ
nw)(~ lMultinomiazn
)()( ),(~nn zzn ββlMultinomiaw
References:
W. Li, D. M. Blei, and A. McCallum. Nonparametric Bayes pachinko allocation. In Proceedings of Conference on Uncertainty in Artificial Intelligence (UAI), 2007.
W. Li and A. McCallum. Pachinko allocation: DAG-structured mixture models of topic correlations. In Proceedings of International Conference on Machine Learning (ICML), 2006.
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3: 993-1022, 2003.
D. M. Blei and J. D. Lafferty. Correlated topic model. In Advances in Neural Information Processing Systems (NIPS), 2006.
D. M. Blei, T. L. Griffiths, M. I. Jordan, and J. B. Tenenbaum. Hierarchical topic models and the nested Chinese restaurant process. In Advances in Neural Information Processing Systems (NIPS), 2004.
J. Aitchison and S. M. Shen. Logistic-normal distributions: Some properties and uses. Biometrika, vol.67, no.2, pp.261-272, 1980.