35
Introduction to LDA Jinyang Gao

Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter

Embed Size (px)

Citation preview

Page 1: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter

Introduction to LDA

Jinyang Gao

Page 2: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter

Outline

• Bayesian Analysis• Dirichlet Distribution• Evolution of Topic Model• Gibbs Sampling• Intuition Analysis of Parameter Setting

Page 3: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter

Outline

• Bayesian Analysis• Dirichlet Distribution• Evolution of Topic Model• Gibbs Sampling• Intuition Analysis of Parameter Setting

Page 4: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter

Bayesian Analysis

• Suppose we have some coins, they have an average 0.75 probability to appear the FRONT.

• We throw a coin, how should we esimate?

• FRONT: 0.75 BACK: 0.25• Prior Estimation

Page 5: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter

Bayesian Analysis

• Suppose we throw a coin 100 times, and we observed that 25 of them is FRONT.

• How should we estimate the next throw:

• FRONT: 0.25 BACK: 0.75• Maximum Likelihood Estimation

Page 6: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter

Bayesian Analysis

• Can we give a trade-off between prior and observation?

• Prior is NOT certain to be some fixed value.– Change 0.75 to a distribution of Beta(u|15, 5)

• Add posterior observation (5 FRONT 15 BACK)– Beta(u|15, 5) to Beta(u|15, 15)

• Calculate the expectation etc.

Page 7: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter

Bayesian Analysis

• Key idea:– Express the uncertainty of prior estimation as a

distribution.– Distribution converge to a single value after more

and more observation– Little observation : prior estimation– Large observation: posterior observation– If we have strong confidence about prior, a single

value estimation after any observation won’t change.

Page 8: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter

Outline

• Bayesian Analysis• Dirichlet Distribution• Evolution of Topic Model• Gibbs Sampling• Intuition Analysis of Parameter Setting

Page 9: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter

Dirichlet Distribution

• Some properties:

• Just some smoothing method by adding on the observation value of each choices.

Page 10: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter

Dirichlet Distribution

• A Dirichlet Distribution with parameter EQUALs to the smoothing method that add to each choices.

• Here EQUAL when we only care about the expectation, but it holds on most application!

• Don’t be deterred by the definition, it is just the Laplace (when we set to be equal) or other smoothing methods represented in a Bayesian way!

Page 11: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter

Outline

• Bayesian Analysis• Dirichlet Distribution• Evolution of Topic Model• Gibbs Sampling• Intuition Analysis of Parameter Setting

Page 12: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter

Evolution of Topic Model

• Here we give some solutions from NAÏVE to LDA.– Kmeans (TF vector version)– Kmeans with KL-divergence(Language Model

Version)– PLSA (fixed topic frequency prior)– LDA (based on topic frequency observation and

smoothing)

Page 13: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter

Evolution of Topic Model

• K-means with TF vector:– We begin with one simplest model.– Just cluster the document!– Each document is a vector of terms.

– How to cluster? K-means!– Each cluster is a topic.– Each topic is a TF vector.

Page 14: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter

Evolution of Topic Model

• Problems of K-means with TF vector– High frequency words over influence(idf logtf and

stop words can help some)– Correlation among the words– Single word than a topic (implement it and you

will see)

Page 15: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter

Evolution of Topic Model

• K-means with KL-divergence:– Generation model about text.– Each text is a probability distribution of words.– Still just cluster the document.

– K-means(not cosine or Euclidean, KL-divegence)– Each cluster is a topic.– Each topic is a distribution of words.

Page 16: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter

Evolution of Topic Model

• Problems of K-means with KL-divergence:– Much better, some topic appear.– Still not clearly.

– Each document only have one topic?– It’s still just a good cluster method for documents.

Page 17: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter

Evolution of Topic Model

• PLSA/PLSI– Each text is a probability distribution of words.– Each text is a distribution of topics.

– Probabilistic way to assign topics and words(EM).– Each cluster is a topic (but no entire document in a

cluster).– Each topic is a distribution of words.

Page 18: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter

Evolution of Topic Model

• Problems of PLSA:– First available version of topic model in this

evolution!

– General words? Context information? See works of QZ Mei among 2005-2008.

– What about the k in K-means?– Each topic is not in the same size.– Can two topics with same distribution combine?– Can a large topic break?

Page 19: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter

Evolution of Topic Model

• LDA:– Gives a prior distribution of topics.– From maximum likelihood estimation(MLE) to

Bayesian analysis in word-to-topic assignments.– Dirichlet is the easiest way!– Give a complete Bayesian analysis.

Page 20: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter

Evolution of Topic Model

• Analysis of LDA:– Small topic will disappear (even the central point

text has a larger probability to be chose by a large nearby topic). K is self-adaption here.

– Smoothing in topic-word distribution.

Page 21: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter

What About Short Text?

• Consider the following:– Lots of documents only have one meaningful

word.– How many words is enough to be a topic?– Usually no ‘blue’ and ‘red’ co-occurred in a short

text, but “blue plane” or “red car”.– ……

Page 22: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter

Evolution of Topic Model

• This is only some milestone in this evolution line. Small changes may give different results.– Text weight– General words– Probabilistic clustering– Hyperparameters– Context information– Hierarchy

Page 23: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter

Evolution of Topic Model

• You SHOULD implement ALL of them if you want to get a deep understand of topic model !– I implemented all of them in both long and short

text in my undergraduate. The code is easy and data is also easy to be obtained.

– Check some topic (and their variation in iteration) and find why they work well or bad.

– You will know more about each consideration in model inference and some derivation is not difficult in code.

Page 24: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter

Evolution of Topic Model

• You should know why some models are RIGHT rather than performs good in experiment. Otherwise you can’t know which model is RIGHT in your own problem (usually some features changed).

• Study the features of models, data and targets carefully. Use Occam's Razor to develop your model.

Page 25: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter

Outline

• Bayesian Analysis• Dirichlet Distribution• Evolution of Topic Model• Gibbs Sampling• Intuition Analysis of Parameter Setting

Page 26: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter

Gibbs Sampling

• Gibbs sampling:– Key idea: if all the parameters are decided, then

the decision for new things should be easy.

– Choose one thing (e.g. one word’s topic etc.)– Fix all others.– Sample (not optimize) based on other.– Loop until converge.

Page 27: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter

Gibbs Sampling

Pls read the paper carefully for the details. It is a easy-to-follow material for Gibbs in LDA.

Page 28: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter

Gibbs Sampling

• EM– Fix all parameters or settings– Compute the best(maximize likelihood) for all

parameters or settings– Changed to the new setting– Loop until converge

Page 29: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter

Gibbs Sampling

• Either Gibbs or EM gives a best estimation!• Exact best estimation is to calculate the

expectation of each random variable consider all the possible situation(exponential), but NOT their optimized expectation in current status.

• But so far these are the best we can do. • No good or bad for them in my personal view.

Page 30: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter

Outline

• Bayesian Analysis• Dirichlet Distribution• Evolution of Topic Model• Gibbs Sampling• Intuition Analysis of Parameter Settings

Page 31: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter

Parameter Settings

• Think first:– : smooth the prior probability of topics.– : smooth the probability of words appears in a

topic.

Page 32: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter

Parameter Settings

• Higher :– Higher probability for rare words in a topic. Rare

words are easy to survive. So more words in a topic in average.

• Higher :– Higher probability for small topics. Small topics are

easy to survive. So more topics in total.– More topics in a documents.

Page 33: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter

Parameter Settings

• Multiple interpretation:– Lower and result in more decisive topic

associations. The words in a topic should be more similar.

– : the topic difference among documents.– : the word similarity in a topic– Don’t forget the K. The largest number of topic

you can have. Self-adaptively means you won’t suffer from bad K for K-means. But you still need to decide which number of topic you need.

Page 34: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter

Summary

• Bayesian Analysis: Prior-Observation Trade-off• Dirichlet Distribution: Smoothing Method• Topic Model Evolution: Why It Works Well• Gibbs and EM: Variable Inference Methods• Parameter Setting: How Many Topics, Words

in a Topic

Page 35: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter

THANKSQ&A