LDA Beginner's Tutorial

Presentation Template Guidelines

Latent Dirichlet Allocation (LDA)- for ML-IR Discussion Group1

Prepared by Wayne Tai Lee, Satpreet Singh

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMELatent Dirichlet Allocation:A Bayesian Unsupervised Learning ModelRoadmap2

Unsupervised learningBayesian StatisticsMixture ModelsLDA theory and intuitionLDA practice and applications

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEUnsupervised LearningLearning patterns with no labels3

Clustering is a form of Unsupervised learning Classification is known as supervised learningValidation is difficult

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAME4

How would you cluster?

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMETake home: validation is difficult.no true answer here.4

5Documents of wikipediaNow try these ones!

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEClustering documents is difficult because many repeated words are used. Some documents may be similar to one another on different topics. So we might want to cluster allowing membership.5

Bayesian StatisticsA framework to update your beliefs6

Probabilities as beliefsUpdates your belief as data is observedRequires a model that describes the data generation


Candidate potentialExample: Evaluating Candidates



Schooling

Experience

Interview

Internship



Schooling

Experience

Interview

Internship

How to update?!



Model for candidates

Model for data generation

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEMixture ModelsA popular statistical model12

An easy way to build hierarchical relationships

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEMixture models visualized13

Candidate QualityHighLow

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAME2 stage process13

14

Marginal Distribution of Candidate Performance: ignore quality


Distribution of Candidate Performance:



Mixture Weights


Mixture Weights




????

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEHow are words in a document generated?19

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEOne possibility:20Each word comes from different topics (bag of words: ignore order)

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAME

20

How are words in a document generated?21

Each word comes from different topics

Mixture Weightfor Topic k

Multinomial Distributionover ALL words basedon topic k

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEExample: the word usage of professional is probably higher in the topic of professional network than a social network.21

Just a mixture model22

WordTopic 1Topic K

LeadershipBig DataMachine Learning



WordTopic 1Topic K


1) Pick a topic

2) Pick a word



WordTopic 1Topic K


The chosen Topic: Z



WordTopic 1Topic K


So we really want to knowZ__

The chosen Topic: Z



WordTopic 1Topic K


So we really want to knowZ (cluster for the word) (document composition) (key words)

The chosen Topic: Z


Review!27

ZW


28

Zd,n

k=1KWd,n

n=1,,Ndd=1,,DK: number of topicsNd: number of wordsD: number of documents


29

Zd,n

k=1KWd,n

n=1,,Ndd=1,,DK: number of topicsNd: number of wordsD: number of documentsBayesian: But what about the distribution for and ??


30

Zd,n

k=1KWd,n

n=1,,Ndd=1,,DK: number of topicsNd: number of wordsD: number of documentsBayesian: But what about the distribution for and ??


31

and control the sparsity of the weights for the multinomial.Implications: a priori we assumeTopics have few key words Documents only have a small subset of topics


Dirichlet Distribution with Different Sparsity Parameters32


Latent Dirichlet Allocation!!!

Zd,n

k=1KWd,n

n=1,,Nd


34

How do we fit this model?

Want the posterior:

Worst part of Bayesian Analysis..personally speaking~


35

Two main ways to get posterior:Sampling methodsAsymtotically correctTime consumingLots of black magic in sampling tricksVariational methods (practical solution!)An approximation with no guaranteesFasterNeed math skills


36

Variational Bayes (specifically mean field variational bayes):Whats crazy?Assumes all the latent variables are independentWhats not crazy?Finds the best model within this crazy class.Best under KL divergence

Empirically have shown promising results!

For sufficient details:Explaining Variational Approximations by Ormerod and Wand


LDA Take Home

37An intuitively appealing Bayesian unsupervised learning modelTraining is difficultLots of packages exist, main issue is scalabilityValidation is difficultUsually cast into a supervised learning frameworkPresentation is difficultVisualization for the Bayesian model is hard.

2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAME

Education

LDA Beginner's Tutorial