26
Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz

Hierarchical Topic Models and the Nested Chinese Restaurant Process

  • Upload
    kami

  • View
    41

  • Download
    0

Embed Size (px)

DESCRIPTION

Hierarchical Topic Models and the Nested Chinese Restaurant Process. Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz. Document classification. One-class approach: one topic per document, with words generated according to the topic. For example, a Naive Bayes model. - PowerPoint PPT Presentation

Citation preview

Page 1: Hierarchical Topic Models and the Nested Chinese Restaurant Process

Hierarchical Topic Models and the Nested Chinese Restaurant

ProcessBlei, Griffiths, Jordan, Tenenbaum

presented by Rodrigo de Salvo Braz

Page 2: Hierarchical Topic Models and the Nested Chinese Restaurant Process

Document classification

• One-class approach: one topic per document, with words generated according to the topic.

• For example, a Naive Bayes model.

Page 3: Hierarchical Topic Models and the Nested Chinese Restaurant Process

Document classification

• It is more realistic to assume more than one topic per document.

• Generative model: pick a mixture distribution over K topics and generate words from it.

Page 4: Hierarchical Topic Models and the Nested Chinese Restaurant Process

Document classification

• Even more realistic: topics may be organized in a hierarchy (not independent);

• Pick a path from root to leaf in a tree; each node is a topic; sample from the mixture.

Page 5: Hierarchical Topic Models and the Nested Chinese Restaurant Process

Dirichlet distribution (DD)

• Distribution over distribution vectors of dimension K:P(p; u, ) = 1/Z(u) i pi

ui

• Parameters are a prior distribution (“previous observations”);

• Symmetric Dirichlet distribution assumes a uniform prior distribution (ui = uj, any i, j).

Page 6: Hierarchical Topic Models and the Nested Chinese Restaurant Process

Latent Dirichlet Allocation (LDA)

• Generative model of multiple-topic documents;

• Generate a mixture distribution on topics using a Dirichlet distribution;

• Pick a topic according to their distribution and generate words according to the word distribution for the topic.

Page 7: Hierarchical Topic Models and the Nested Chinese Restaurant Process

Latent Dirichlet Allocation (LDA)

K

W

wWords

Topics

Topic distribution

DD hyper parameter

Page 8: Hierarchical Topic Models and the Nested Chinese Restaurant Process

Chinese Restaurant Process (CRP)

1 out of 9 customers

Page 9: Hierarchical Topic Models and the Nested Chinese Restaurant Process

Chinese Restaurant Process (CRP)

2 out of 9 customers

Page 10: Hierarchical Topic Models and the Nested Chinese Restaurant Process

Chinese Restaurant Process (CRP)

3 out of 9 customers

Page 11: Hierarchical Topic Models and the Nested Chinese Restaurant Process

Chinese Restaurant Process (CRP)

4 out of 9 customers

Page 12: Hierarchical Topic Models and the Nested Chinese Restaurant Process

Chinese Restaurant Process (CRP)

5 out of 9 customers

Page 13: Hierarchical Topic Models and the Nested Chinese Restaurant Process

Chinese Restaurant Process (CRP)

6 out of 9 customers

Page 14: Hierarchical Topic Models and the Nested Chinese Restaurant Process

Chinese Restaurant Process (CRP)

7 out of 9 customers

Page 15: Hierarchical Topic Models and the Nested Chinese Restaurant Process

Chinese Restaurant Process (CRP)

8 out of 9 customers

Page 16: Hierarchical Topic Models and the Nested Chinese Restaurant Process

Chinese Restaurant Process (CRP)

9 out of 9 customers

Data point (a distribution itself) sampled

Page 17: Hierarchical Topic Models and the Nested Chinese Restaurant Process

Species Sampling Mixture

• Generative model of multiple-topic documents;

• Generate a mixture distribution on topics using a CRP prior;

• Pick a topic according to their distribution and generate words according to the word distribution for the topic.

Page 18: Hierarchical Topic Models and the Nested Chinese Restaurant Process

Species Sampling Mixture

K

W

wWords

Topics

Topic distribution

CRP hyper parameter

Page 19: Hierarchical Topic Models and the Nested Chinese Restaurant Process

Nested CRP1

1

1

2

2

2

3

3

3

4

4

4

5

5

5

6

6

6

Page 20: Hierarchical Topic Models and the Nested Chinese Restaurant Process

Hierarchical LDA (hLDA)

• Generative model of multiple-topic documents;• Generate a mixture distribution on topics using a

Nested CRP prior;• Pick a topic according to their distribution and

generate words according to the word distribution for the topic.

Page 21: Hierarchical Topic Models and the Nested Chinese Restaurant Process

hLDA graphical model

Page 22: Hierarchical Topic Models and the Nested Chinese Restaurant Process

Artificial data experiment

100 1000-word documents on 25-term vocabulary

Each vertical bar is a topic

Page 23: Hierarchical Topic Models and the Nested Chinese Restaurant Process

CRP prior vs. Bayes Factors

Page 24: Hierarchical Topic Models and the Nested Chinese Restaurant Process

Predicting the structure

Page 25: Hierarchical Topic Models and the Nested Chinese Restaurant Process

NIPS abstracts

Page 26: Hierarchical Topic Models and the Nested Chinese Restaurant Process

Comments

• Accommodates growing collections of data;

• Hierarchical organization makes sense, but not clear to me why the CRP prior is the best prior for that;

• No mention of time; maybe it takes a very long time.