Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Sampling Table Configulations for theHierarchical Poisson-Dirichlet Process
Changyou Chen1,2, Lan Du1,2, Wray Buntine2,1
1ANU College of Engineering and Computer ScienceThe Australian National University
2National ICT, Australia
September 7, 2011
1 Changyou Chen, Lan Du, Wray Buntine Sampling Table Configulations for the Hierarchical Poisson-Dirichlet Process
The Poisson-Dirichlet Process
Outline
1 The Poisson-Dirichlet Process
2 Table Indicator Representation of the HPDP
3 Experiments: Topic Modeling using the HPDP
4 Conclusion
2 Changyou Chen, Lan Du, Wray Buntine Sampling Table Configulations for the Hierarchical Poisson-Dirichlet Process
The Poisson-Dirichlet Process
What can We Do with the Poisson-Dirichlet Process?Applications of the Poisson-Dirichlet process:
Topic modeling: Finding meaningful topics discussed inlarge scale documents. Beneficial to automatic documentanalysis and understanding.Computational linguistic: e.g., the n-gram model, adaptorgrammar.Computer vision. Using PDP/HPDP for image annotation,image segmentation, scene learning, and etc.Others: Data compression, relational modeling, etc.
Cows, grass, hill, sky, tree... · · ·
3 Changyou Chen, Lan Du, Wray Buntine Sampling Table Configulations for the Hierarchical Poisson-Dirichlet Process
The Poisson-Dirichlet Process
What is the Poisson-Dirichlet Process?
The Poisson-Dirichlet process takes as input a basedistribution, and yields as output a discreet distributionwhich is somewhat similar to the base distribution.
0 1 2 3 4 5 6 7 8 9 100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
y = f (x)
7−→
0 10 20 30 40 50 60 70 80 90 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
y =K
∑i=0
piδxi
4 Changyou Chen, Lan Du, Wray Buntine Sampling Table Configulations for the Hierarchical Poisson-Dirichlet Process
The Poisson-Dirichlet Process
What is the Poisson-Dirichlet Process?The Poisson-Dirichlet process (PDP) is a randomprobability measure, or a distribution over distributions.The basic form of the PDP is:
∞
∑k=1
pkδX∗k (·) (1)
where~p = (p1,p2, ...) is a probability vector so 0≤ pk ≤ 1and ∑
∞k=1 pk = 1. Also, δX∗k (·) is a discrete measure
concentrated at X∗k . The values X∗k ∈X are i.i.d. drawnfrom the base measure H(·).The one parameter version of the PDP is the Dirichletprocess (DP), and the two parameter version is thePitman-Yor process.
5 Changyou Chen, Lan Du, Wray Buntine Sampling Table Configulations for the Hierarchical Poisson-Dirichlet Process
The Poisson-Dirichlet Process
Why Poisson-Dirichlet Processes?It allows us to extend finite mixture models to infinitemixture models, by putting a PDP prior on the mixturecomponents.
x
γ
c
θN∞
c∼ P(γ|λ ), x∼ P(x|θc)
P(γ|λ ) is a infinite discreet distribution with parameter λ ,
or a Poisson-Dirichlet distribution.
x
G
θ
N
θ ∼ G, x∼ P(x|θ)
G is the distribution over mixture components θ , or a
Poisson-Dirichlet Process.
6 Changyou Chen, Lan Du, Wray Buntine Sampling Table Configulations for the Hierarchical Poisson-Dirichlet Process
The Poisson-Dirichlet Process
Why Poisson-Dirichlet Processes?Easily extend to model hierarchical discreet distributions,e.g., hierarchical Dirichlet processes (HDP) or hierarchicalPoisson-Dirichlet process (HPDP).
e.g.
x
G
θ
N
G0
(a) HDP
e.g.~p0
~p2
~p...
~p1 ~pK
~p... ~pj1
~p... ~pj2 ~p...(b) Probability vector hierarchy
7 Changyou Chen, Lan Du, Wray Buntine Sampling Table Configulations for the Hierarchical Poisson-Dirichlet Process
The Poisson-Dirichlet Process
The Chinese Restaurant Process
G
x
N(a) PDP
p
xN
c
x*
∞
(b) PDP
xN
c
x*
∞
(c) CRP
The Chinese restaurant process (c) (CRP) is themarginalized version of the PDP (b).Sampling the PDP can be done following the CRP’smetaphor.
8 Changyou Chen, Lan Du, Wray Buntine Sampling Table Configulations for the Hierarchical Poisson-Dirichlet Process
The Poisson-Dirichlet Process
The Chinese Restaurant ProcessThe CRP assumes a Chinese restaurant has an infinitenumber of tables, each with infinite capacity, the customersgo in and sit at the restaurant as:
The first customer sits at the first unoccupied table withprobability 1.For the subsequent (n+1)-th customer:
He can choose to sit at an occupied table to share the dishwith other seated customers with probability proportional tothe number of customers that have already sit at that table.or to sit at an unoccupied table with probability proportionalto the sum of strength parameter and the total table count.
1 1 1 1 1 1 2
1 1 2...... 3 ......
9 Changyou Chen, Lan Du, Wray Buntine Sampling Table Configulations for the Hierarchical Poisson-Dirichlet Process
Table Indicator Representation of the HPDP
Outline
1 The Poisson-Dirichlet Process
2 Table Indicator Representation of the HPDP
3 Experiments: Topic Modeling using the HPDP
4 Conclusion
10 Changyou Chen, Lan Du, Wray Buntine Sampling Table Configulations for the Hierarchical Poisson-Dirichlet Process
Table Indicator Representation of the HPDP
HPDP in the Chinese Restaurant Process Metaphor
This shows a simple linear hierarchy of 3 Chineserestaurants. We’ll bring customers in and watch theseating.
0
1
2
z: dishu: tableindicator
11 Changyou Chen, Lan Du, Wray Buntine Sampling Table Configulations for the Hierarchical Poisson-Dirichlet Process
Table Indicator Representation of the HPDP
HPDP in the Chinese Restaurant Process Metaphor
t = ?2
1
0
z =?u = 2
12 Changyou Chen, Lan Du, Wray Buntine Sampling Table Configulations for the Hierarchical Poisson-Dirichlet Process
Table Indicator Representation of the HPDP
HPDP in the Chinese Restaurant Process Metaphor
t = ?
t = ?
0
1
2
z =?u = 1
13 Changyou Chen, Lan Du, Wray Buntine Sampling Table Configulations for the Hierarchical Poisson-Dirichlet Process
Table Indicator Representation of the HPDP
HPDP in the Chinese Restaurant Process Metaphor
t = 1
t = ?
t = ?2
1
0
z =?u = 0
14 Changyou Chen, Lan Du, Wray Buntine Sampling Table Configulations for the Hierarchical Poisson-Dirichlet Process
Table Indicator Representation of the HPDP
HPDP in the Chinese Restaurant Process Metaphor
t = 1
t = 1
t = 1
0
1
2
z = 1u = 0
15 Changyou Chen, Lan Du, Wray Buntine Sampling Table Configulations for the Hierarchical Poisson-Dirichlet Process
Table Indicator Representation of the HPDP
HPDP in the Chinese Restaurant Process Metaphor
t = 1
t = 1
t = 12
1
0
z = 1u = NA
16 Changyou Chen, Lan Du, Wray Buntine Sampling Table Configulations for the Hierarchical Poisson-Dirichlet Process
Table Indicator Representation of the HPDP
HPDP in the Chinese Restaurant Process Metaphor
t = 1
t = 1
t = 1 t = ?
0
1
2
z =?u = 2
17 Changyou Chen, Lan Du, Wray Buntine Sampling Table Configulations for the Hierarchical Poisson-Dirichlet Process
Table Indicator Representation of the HPDP
HPDP in the Chinese Restaurant Process Metaphor
t = 1
t = 1
t = 1 t = ?2
1
0
z =?u = 2
18 Changyou Chen, Lan Du, Wray Buntine Sampling Table Configulations for the Hierarchical Poisson-Dirichlet Process
Table Indicator Representation of the HPDP
HPDP in the Chinese Restaurant Process Metaphor
t = 1
t = 1
t = 1 t = 1
0
1
2
z = 1u = 2
19 Changyou Chen, Lan Du, Wray Buntine Sampling Table Configulations for the Hierarchical Poisson-Dirichlet Process
Table Indicator Representation of the HPDP
HPDP in the Chinese Restaurant Process Metaphor
t = 1
t = 1
t = 1 t = 12
1
0
z = 1u = NA
20 Changyou Chen, Lan Du, Wray Buntine Sampling Table Configulations for the Hierarchical Poisson-Dirichlet Process
Table Indicator Representation of the HPDP
HPDP in the Chinese Restaurant Process Metaphor
t = 1
t = 1
t = 1 t = 1
0
1
2
z = 1u = NA
21 Changyou Chen, Lan Du, Wray Buntine Sampling Table Configulations for the Hierarchical Poisson-Dirichlet Process
Table Indicator Representation of the HPDP
HPDP in the Chinese Restaurant Process Metaphor
t = 1
t = 1
t = 1 t = 1 t = ?2
1
0
z =?u = 2
22 Changyou Chen, Lan Du, Wray Buntine Sampling Table Configulations for the Hierarchical Poisson-Dirichlet Process
Table Indicator Representation of the HPDP
HPDP in the Chinese Restaurant Process Metaphor
t = 1
t = 1
t = 1 t = 1 t = ?
t = ?
t = 20
1
2
z =?u = 0
23 Changyou Chen, Lan Du, Wray Buntine Sampling Table Configulations for the Hierarchical Poisson-Dirichlet Process
Table Indicator Representation of the HPDP
HPDP in the Chinese Restaurant Process Metaphor
t = 1
t = 1
t = 1 t = 1 t = 2
t = 2
t = 2
2
1
0
z = 2u = 0
24 Changyou Chen, Lan Du, Wray Buntine Sampling Table Configulations for the Hierarchical Poisson-Dirichlet Process
Table Indicator Representation of the HPDP
HPDP in the Chinese Restaurant Process Metaphor
t = 1
t = 1
t = 1 t = 1 t = 2
t = 2
t = 2
t = ?
0
1
2
z =?u = 2
25 Changyou Chen, Lan Du, Wray Buntine Sampling Table Configulations for the Hierarchical Poisson-Dirichlet Process
Table Indicator Representation of the HPDP
HPDP in the Chinese Restaurant Process Metaphor
t = 1
t = 1
t = 1 t = 1 t = 2
t = 2
t = 2
t = ?
t = ?
2
1
0
z =?u = 1
26 Changyou Chen, Lan Du, Wray Buntine Sampling Table Configulations for the Hierarchical Poisson-Dirichlet Process
Table Indicator Representation of the HPDP
HPDP in the Chinese Restaurant Process Metaphor
t = 1
t = 1
t = 1 t = 1 t = 2
t = 2
t = 2
t = ?
t = ?
t = 30
1
2
z =?u = 0
27 Changyou Chen, Lan Du, Wray Buntine Sampling Table Configulations for the Hierarchical Poisson-Dirichlet Process
Table Indicator Representation of the HPDP
HPDP in the Chinese Restaurant Process Metaphor
t = 1
t = 1
t = 1 t = 1 t = 2
t = 2
t = 2
t = 3
t = 3
t = 3
2
1
0
z = 3u = 0
28 Changyou Chen, Lan Du, Wray Buntine Sampling Table Configulations for the Hierarchical Poisson-Dirichlet Process
Table Indicator Representation of the HPDP
HPDP in the Chinese Restaurant Process Metaphor
We can also add customers at any restaurant.
t = 1
t = 1
t = 1 t = 1 t = 2
t = 2
t = 2
t = 3
t = 3
t = 3 ......
......
......
0
1
2
29 Changyou Chen, Lan Du, Wray Buntine Sampling Table Configulations for the Hierarchical Poisson-Dirichlet Process
Table Indicator Representation of the HPDP
HPDP Statistics used in Sampling
Seatingarrangementsrepresenation
t0,1 = 1,m0,1,1 = 2t0,2 = 1,m0,2,1 = 3t0,3 = 1,m0,3,1 = 1t1,1 = 1,m1,1,1 = 4t1,2 = 1,m1,2,1 = 1t1,3 = 1,m1,3,1 = 2t2,1 = 2,m2,1,1 = 2,
m2,1,2 = 3t2,2 = 1,m2,2,1 = 2t2,3 = 1,m2,3,1 = 1
~m = counts at eachtable
t = 1
t = 1
t = 1 t = 1 t = 2
t = 2
t = 2
t = 3
t = 3
t = 3 ......
......
......
z = 1u = 0
z = 1u = NA
z = 1u = 2
z = 2u = 0
z = 3u = 0
...... ...... ......
0
1
2
Table indicatorsrepresentation
n0,1 = 2, t0,1 = 1n0,2 = 3, t0,2 = 1n0,3 = 1, t0,3 = 1n1,1 = 4, t1,1 = 1n1,2 = 1, t1,2 = 1n1,3 = 2, t1,3 = 1n2,1 = 5, t2,1 = 2n2,2 = 2, t2,2 = 1n2,3 = 1, t2,3 = 1
~t = number of ta-bles
30 Changyou Chen, Lan Du, Wray Buntine Sampling Table Configulations for the Hierarchical Poisson-Dirichlet Process
Table Indicator Representation of the HPDP
Table Indicator Representation of the HPDP
The above u is called the table indicator, defined as:
Definition (Table indicator ul)
The table indicator ul for each customer l is an auxiliary latentvariable which indicates up to which level in the restauranthierarchy l has contributed a table count (i.e. activated a newtable).
It is enough to use this variable to represent the HPDP,e.g., other statistics (#customers and #tables in eachrestaurant) can be reconstructed from the table indicatorseasily.A block Gibbs sampler can be derived using thisrepresentation.
31 Changyou Chen, Lan Du, Wray Buntine Sampling Table Configulations for the Hierarchical Poisson-Dirichlet Process
Table Indicator Representation of the HPDP
Table Counts Representation of the HPDP
Theorem: (Teh et. al., 2006 [1]; Buntine and Hutter, 2010 [2])Posterior of the HPDP with table count representation:
Pr(~z1:J,~t1:J | H0) = ∏j≥0
((bj|aj)Tj
(bj)Nj∏
kSnjk
tjk,aj
)
njk : #customers in the j-th restaurant eating dish k
tjk : #tables in the j-th restaurant serving dish k
Nj : = ∑k
njk, Tj := ∑k
tjk
SNM,a : the generalized Stirling number
(x|y)N : the Pochhammer symbol with increment y (2)H0 : base distribution
32 Changyou Chen, Lan Du, Wray Buntine Sampling Table Configulations for the Hierarchical Poisson-Dirichlet Process
Table Indicator Representation of the HPDP
Table Indicator Representation of the HPDP
TheoremPosterior of the HPDP in our representation:
Pr(~z1:J,~u1:J | H0) = ∏j≥0
((bj|aj)Tj
(bj)Nj∏
kSnjk
tjk,aj
tjk!(njk− tjk)!njk!
)
the symbols are the same with the previous ones except that tjkcan be constructed from the table indicators:
tjk = ∑j′∈T(j)
∑l∈D(j′)
δzl=kδul≤d(j)
33 Changyou Chen, Lan Du, Wray Buntine Sampling Table Configulations for the Hierarchical Poisson-Dirichlet Process
Table Indicator Representation of the HPDP
How does our Sampler Work?
Sampling a DP with dim = 50, data=500 points, true conc.=100:
389
390
391
392
393
394
395
396
0 500 1000 1500 2000 2500 3000 3500 4000
Mean table
s
Time (ms)
One run with K=50, N=500, disc.=0, conc.=100
Sampling with seating arrangementsTable indicator sampler
Collapsed table sampler
(a) Estimate of total table counts
0
1
2
3
4
5
6
0 500 1000 1500 2000 2500 3000 3500 4000
Devia
tion for
mean table
s
Time (ms)
Mean over 20 runs from with K=50, N=500, disc.=0, conc.=100
Sampling with seating arrangementsTable indicator sampler
Collapsed table sampler
(b) Std.Dev. of total table counts
97
98
99
100
101
102
103
0 500 1000 1500 2000 2500 3000 3500 4000
Mean c
oncentr
ation
Time (ms)
One run with K=50, N=500, disc.=0, conc.=100
Sampling with seating arrangementsTable indicator sampler
Collapsed table sampler
(c) Estimate of concentration par. b
0
0.5
1
1.5
2
2.5
3
0 500 1000 1500 2000 2500 3000 3500 4000
Devia
tion for
concentr
ation
Time (ms)
Mean over 20 runs with K=50, N=500, disc.=0, conc.=100
Sampling with seating arrangementsTable indicator sampler
Collapsed table sampler
(d) Std.Dev. of concentration par. b
34 Changyou Chen, Lan Du, Wray Buntine Sampling Table Configulations for the Hierarchical Poisson-Dirichlet Process
Experiments: Topic Modeling using the HPDP
Outline
1 The Poisson-Dirichlet Process
2 Table Indicator Representation of the HPDP
3 Experiments: Topic Modeling using the HPDP
4 Conclusion
35 Changyou Chen, Lan Du, Wray Buntine Sampling Table Configulations for the Hierarchical Poisson-Dirichlet Process
Experiments: Topic Modeling using the HPDP
Datasets, Compared Algorithms and Evaluations
We use five datasets (from Blogs, Reuters, NIPS, UCI):Health Person Obama NIPS Enron
# words 1,119,678 1,656,574 1,382,667 1,932,365 6,412,172# documents 1,655 8,616 9,295 1,500 39,861vocabulary size 12,863 32,946 18,138 12,419 28,102
Compared algorithms:Teh et.al. ’s sampling direct assignment SDA [1]Buntine and Hutter’s collapsed table sampler CTS [2]Our proposed block Gibbs sampler STCSTC initialized with SDA, denoted as STC + SDA
We used the unbiased left-to-right algorithm [3] tocalculate the testing perplexities for the topic models,which is a standard evaluation.
36 Changyou Chen, Lan Du, Wray Buntine Sampling Table Configulations for the Hierarchical Poisson-Dirichlet Process
Experiments: Topic Modeling using the HPDP
Perplexities
Table: Test log2(perplexities) on the five datasets
Dataset Health Person ObamaI = 100 I = 200 I = 1000 I = 2000 I = 1000 I = 2000
SDA 11.628281 11.619546 11.930657 11.904425 11.144188 11.134732CTS 11.655493 11.636743 11.940532 11.947740 11.191377 11.174327
SDA+STC 11.582969 11.573457 11.844319 11.829628 11.094079 11.090389STC 11.547999 11.551453 11.858719 11.852253 11.210295 11.201241
Dataset Enron NIPSI = 500 I = 1000 I = 1000 I = 2000
SDA 10.847454 10.768568 10.564221 10.558330SDA+STC 10.768568 10.659724 10.534148 10.518792
STC 10.8530341 10.810127 10.474467 10.425393
1updated result37 Changyou Chen, Lan Du, Wray Buntine Sampling Table Configulations for the Hierarchical Poisson-Dirichlet Process
Experiments: Topic Modeling using the HPDP
Convergence Speed
22 9600 19232 28822 36010
11.6
11.8
12
12.2
training time (s)
test
ing
perp
lexi
ty
SDASTCCTSSDA+STC
(a) Health with I = 100
32 9619 19215 28830 36046
11.6
11.8
12
12.2
12.4
training time (s)te
stin
g pe
rple
xity
SDASTCCTSSDA+STC
(b) Health with I = 200
83 18025 32409 46832 61218 75611 68405
12
12.5
13
training time (s)
test
ing
perp
lexi
ty
SDASTCCTSSDA+STC
(c) Person with I = 1000
116 18041 32439 46817 61246 75629
12
12.5
13
13.5
training time (s)
test
ing
perp
lexi
ty
SDASTCCTSSDA+STC
(d) Person with I = 2000
58 19205 38417 57650 7680510.4
10.6
10.8
11
11.2
11.4
training time (s)
test
ing
perp
lexi
ty
SDASTCSDA+STC
(e) NIPS with I = 1000
138 19210 38456 57627 7693410.4
10.6
10.8
11
11.2
11.4
training time (s)
test
ing
perp
lexi
ty
SDASTCSDA+STC
(f) NIPS with I = 2000
38 Changyou Chen, Lan Du, Wray Buntine Sampling Table Configulations for the Hierarchical Poisson-Dirichlet Process
Conclusion
Outline
1 The Poisson-Dirichlet Process
2 Table Indicator Representation of the HPDP
3 Experiments: Topic Modeling using the HPDP
4 Conclusion
39 Changyou Chen, Lan Du, Wray Buntine Sampling Table Configulations for the Hierarchical Poisson-Dirichlet Process
Conclusion
Conclusion
Proposed a new representation for the HPDP based on theCRP metaphor.All useful statistics of the CRP can be reconstructed fromthe table indicator.No dynamic memory allocations for table counts.A blocked Gibbs sampler can be easily derived, e.g., we donot have to sample the table counts separately.Experimental results on topic modeling indicate fast mixingof the proposed algorithm.All other PDP related applications can be adapted to thisrepresentation.
40 Changyou Chen, Lan Du, Wray Buntine Sampling Table Configulations for the Hierarchical Poisson-Dirichlet Process
Conclusion
Reference
Teh, Y.:A Bayesian interpretation of interpolated Kneser-Ney.Technical Report TRA2/06, School of Computing, NationalUniversity of Singapore (2006)
Buntine, W., Hutter, M.:A Bayesian review of the Poisson-Dirichlet process.Technical Report arXiv:1007.0296, NICTA and ANU,Australia (2010)
Buntine, W.:Estimating likelihoods for topic models.In: ACML ’09. (2009) 51–64
41 Changyou Chen, Lan Du, Wray Buntine Sampling Table Configulations for the Hierarchical Poisson-Dirichlet Process
Conclusion
Thanks for your attention!!!
42 Changyou Chen, Lan Du, Wray Buntine Sampling Table Configulations for the Hierarchical Poisson-Dirichlet Process