If you can't read please download the document
Upload
wayne-lee
View
18.984
Download
2
Embed Size (px)
Citation preview
Presentation Template Guidelines
Latent Dirichlet Allocation (LDA)- for ML-IR Discussion Group1
Prepared by Wayne Tai Lee, Satpreet Singh
2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMELatent Dirichlet Allocation:A Bayesian Unsupervised Learning ModelRoadmap2
Unsupervised learningBayesian StatisticsMixture ModelsLDA theory and intuitionLDA practice and applications
2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEUnsupervised LearningLearning patterns with no labels3
Clustering is a form of Unsupervised learning Classification is known as supervised learningValidation is difficult
2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAME4
How would you cluster?
2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMETake home: validation is difficult.no true answer here.4
5Documents of wikipediaNow try these ones!
2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEClustering documents is difficult because many repeated words are used. Some documents may be similar to one another on different topics. So we might want to cluster allowing membership.5
Bayesian StatisticsA framework to update your beliefs6
Probabilities as beliefsUpdates your belief as data is observedRequires a model that describes the data generation
2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAME7
Candidate potentialExample: Evaluating Candidates
2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAME8
Candidate potentialExample: Evaluating Candidates
Schooling
Experience
Interview
Internship
2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAME9
Candidate potentialExample: Evaluating Candidates
Schooling
Experience
Interview
Internship
How to update?!
2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAME10
2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAME11
Model for candidates
Model for data generation
2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEMixture ModelsA popular statistical model12
An easy way to build hierarchical relationships
2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEMixture models visualized13
Candidate QualityHighLow
2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAME2 stage process13
14
Marginal Distribution of Candidate Performance: ignore quality
2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAME15
Distribution of Candidate Performance:
2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAME16
Distribution of Candidate Performance:
Mixture Weights
2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAME17
Mixture Weights
Distribution of Candidate Performance:
2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAME18
Distribution of Candidate Performance:
????
2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEHow are words in a document generated?19
2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEOne possibility:20Each word comes from different topics (bag of words: ignore order)
2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAME
20
How are words in a document generated?21
Each word comes from different topics
Mixture Weightfor Topic k
Multinomial Distributionover ALL words basedon topic k
2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEExample: the word usage of professional is probably higher in the topic of professional network than a social network.21
Just a mixture model22
WordTopic 1Topic K
LeadershipBig DataMachine Learning
2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAME2 stage process22
Just a mixture model23
WordTopic 1Topic K
LeadershipBig DataMachine Learning
1) Pick a topic
2) Pick a word
2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAME2 stage process23
Just a mixture model24
WordTopic 1Topic K
LeadershipBig DataMachine Learning
The chosen Topic: Z
2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAME2 stage process24
Just a mixture model25
WordTopic 1Topic K
LeadershipBig DataMachine Learning
So we really want to knowZ__
The chosen Topic: Z
2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAME2 stage process25
Just a mixture model26
WordTopic 1Topic K
LeadershipBig DataMachine Learning
So we really want to knowZ (cluster for the word) (document composition) (key words)
The chosen Topic: Z
2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAME2 stage process26
Review!27
ZW
2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEExample: the word usage of professional is probably higher in the topic of professional network than a social network.27
28
Zd,n
k=1KWd,n
n=1,,Ndd=1,,DK: number of topicsNd: number of wordsD: number of documents
2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEExample: the word usage of professional is probably higher in the topic of professional network than a social network.28
29
Zd,n
k=1KWd,n
n=1,,Ndd=1,,DK: number of topicsNd: number of wordsD: number of documentsBayesian: But what about the distribution for and ??
2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEExample: the word usage of professional is probably higher in the topic of professional network than a social network.29
30
Zd,n
k=1KWd,n
n=1,,Ndd=1,,DK: number of topicsNd: number of wordsD: number of documentsBayesian: But what about the distribution for and ??
2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEExample: the word usage of professional is probably higher in the topic of professional network than a social network.30
31
and control the sparsity of the weights for the multinomial.Implications: a priori we assumeTopics have few key words Documents only have a small subset of topics
2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEExample: the word usage of professional is probably higher in the topic of professional network than a social network.31
Dirichlet Distribution with Different Sparsity Parameters32
2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAME33
Latent Dirichlet Allocation!!!
Zd,n
k=1KWd,n
n=1,,Nd
2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEExample: the word usage of professional is probably higher in the topic of professional network than a social network.33
34
How do we fit this model?
Want the posterior:
Worst part of Bayesian Analysis..personally speaking~
2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEExample: the word usage of professional is probably higher in the topic of professional network than a social network.34
35
Two main ways to get posterior:Sampling methodsAsymtotically correctTime consumingLots of black magic in sampling tricksVariational methods (practical solution!)An approximation with no guaranteesFasterNeed math skills
2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEExample: the word usage of professional is probably higher in the topic of professional network than a social network.35
36
Variational Bayes (specifically mean field variational bayes):Whats crazy?Assumes all the latent variables are independentWhats not crazy?Finds the best model within this crazy class.Best under KL divergence
Empirically have shown promising results!
For sufficient details:Explaining Variational Approximations by Ormerod and Wand
2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAMEExample: the word usage of professional is probably higher in the topic of professional network than a social network.36
LDA Take Home
37An intuitively appealing Bayesian unsupervised learning modelTraining is difficultLots of packages exist, main issue is scalabilityValidation is difficultUsually cast into a supervised learning frameworkPresentation is difficultVisualization for the Bayesian model is hard.
2013 LinkedIn Corporation. All Rights Reserved.ORGANIZATION NAME