25
Technical Foundations and Inference Topic Model Tutorial - Part 2 Hannover, 2016 Arnim Bleier [email protected]

Technical Foundations and Inference - topicmodels.infotopicmodels.info/ckling/tmt/part2.pdf · Technical Foundations and Inference Topic Model Tutorial - Part 2 Hannover, 2016

  • Upload
    others

  • View
    18

  • Download
    0

Embed Size (px)

Citation preview

Technical Foundationsand Inference

Topic Model Tutorial - Part 2 Hannover, 2016

Arnim [email protected]

2

● Probabilistic Graphical Models are a general framework to represent assumptions about the (in-) dependence between random variables.

● Knowing the inner workings of Topic Models helps us to better interpret their results.

Why should we care?

3

Outline

● Generative storylines & Plates

● Gibbs sampling

● Simple Topic Model

● Latent Dirichlet Allocation

4

Recap: Conference dinner

?

k 1 k 2k 1 k 3510

210

310

for k 1

for k 2

Probabilities:

for k 3

5

Recap: Conference dinner

k 1 k 2k 1 k 3

?

=normalizing constant

number of observations in k

General case:

6

Generative Storyline

=

N+1

7

Generative Storyline

N+1

prior

8

Plate Notation

i ;

9

Gibbs sampling

X

Iteratively sample each variable conditioned on all other variables.

10

Gibbs sampling

X

prior

iterations

stationarydistribution

11

Simple Topic Model

Generative Storyline:

d

Draw a global distribution over topics.

For each document ddraw a topic.

12

Simple Topic Model

Generative Storyline:

d

For each topic k, draw a distribution over the vocabulary.

dw

For each document ddraw the words w from the topic

indexed by z .

d

d

d

* Mixture of Unigrams

*

13

Likelihood of document d being generated from topic k.

Simple Topic Model

*

* Approximation not considering the dependence of words within documents.

d

di=1

d

d

di

d

14

Simple Topic Model

*

d

di=1

d

d

di

d= ?

We need to know from which topic k document d was generated.

Global distribution over topics.

topics

document

15

Simple Topic Model

document

*

d

di=1

d

d

di

d=

16

Simple Topic Model

document

*

d

di=1

d

d

di

d=

17

Simple Topic Model

document

*

d

di=1

d

d

di

d=

18

Simple Topic Model

document

*

d

di=1

d

d

di

d=

We can now sample the membership for document d and update the model.

19

Latent Dirichlet Allocation

Generative Storyline:

20

Latent Dirichlet Allocation

Generative Storyline:

Document specific distribution over topics.

21

Latent Dirichlet Allocation

Likelihood of word i in document d being generated from topic k.

22

(Simple Topic Model)Associated Press Topics

23

Associated Press Topics(LDA)

24

Conclusions

● Topic Models can be formulated within the wider framework of Probabilistic Graphical Models.

● Different versions of Topic Models can be formulated.

● More complex models are not necessarily better.

● However, more complex models can help to express assumptions about the dataset.

Thank you!

25

References

● M. Steyvers, T. Griffiths. Latent Semantic Analysis: A Road to Meaning, chap. Probabilistic topic models, 2007

● Heinrich, Gregor. Parameter estimation for text analysis, 2008.

● P. Resnik, E. Hardisty. Gibbs sampling for the uninitiated, 2010.

● M. D. Lee, E. J. Wagenmakers. Bayesian cognitive modeling: A practical course, 2014.

● S. Jackman. Bayesian analysis for the social sciences , 2009.