Upload
vuthien
View
221
Download
0
Embed Size (px)
Citation preview
CS6140:MachineLearningSpring2017
Instructor:LuWangCollegeofComputerandInforma@onScience
NortheasternUniversityWebpage:www.ccs.neu.edu/home/luwang
Email:[email protected]
Logis@cs
• Assignment3isdueon3/30.
• 4/13:courseprojectpresenta@on.
• 4/20:finalexam.
Whatwelearnedlast@me
• Sequen@allabelingmodels– HiddenMarkovModels– Maximum-entropyMarkovmodel– Condi@onalRandomFields
Sample Markov Model for POS
0.95 0.9
0.05 stop
0.5
0.1
0.8
0.1
0.1
0.25
0.25
start 0.1
0.5 0.4
Det Noun
PropNoun
Verb
TheMarkovAssump@on
HiddenMarkovModels(HMMs)
Words Part-of-Speechtags
Formally
Viterbi Backtrace
s1 s2
sN
• • •
• • •
s0 sF • • •
• • •
• • •
• • • • • •
• • •
• • •
t1 t2 t3 tT-1 tT
Most likely Sequence: s0 sN s1 s2 …s2 sF
Log-LinearModels
UsingLog-LinearModels
Condi@onalRandomFields(CRFs)
Today’sOutline
• BayesianNetworks• MixtureModels• Expecta@onMaximiza@on• LatentDirichletAlloca@on
[SomeslidesareborrowedfromChristopherBishopandDavidSontag]
Today’sOutline
• BayesianNetworks• MixtureModels• Expecta@onMaximiza@on• LatentDirichletAlloca@on
K-meansAlgorithm• Goal:representadatasetintermsofKclusterseachofwhichissummarizedbyaprototype(mean)
• Ini@alizeprototypes,theniteratebetweentwophases:– Step1:assigneachdatapointtonearestprototype
– Step2:updateprototypestobetheclustermeans• SimplestversionisbasedonEuclideandistance
BCSSummerSchool,Exeter,2003 ChristopherM.Bishop
BCSSummerSchool,Exeter,2003 ChristopherM.Bishop
BCSSummerSchool,Exeter,2003 ChristopherM.Bishop
BCSSummerSchool,Exeter,2003 ChristopherM.Bishop
BCSSummerSchool,Exeter,2003 ChristopherM.Bishop
BCSSummerSchool,Exeter,2003 ChristopherM.Bishop
BCSSummerSchool,Exeter,2003 ChristopherM.Bishop
BCSSummerSchool,Exeter,2003 ChristopherM.Bishop
BCSSummerSchool,Exeter,2003 ChristopherM.Bishop
TheGaussianDistribu@on• Mul@variateGaussian
mean covariance
GaussianMixtures• Linearsuper-posi@onofGaussians
• Normaliza@onandposi@vityrequire
• Caninterpretthemixingcoefficientsaspriorprobabili@es
Example:Mixtureof3Gaussians
ContoursofProbabilityDistribu@on
SamplingfromtheGaussian
• Togenerateadatapoint:– firstpickoneofthecomponentswithprobability– thendrawasamplefromthatcomponent
• Repeatthesetwostepsforeachnewdatapoint
Synthe@cDataSet
Synthe@cDataSetWithoutLabels
FiengtheGaussianMixture
• Wewishtoinvertthisprocess–giventhedataset,findthecorrespondingparameters:– mixingcoefficients– means– Covariances
FiengtheGaussianMixture
• Wewishtoinvertthisprocess–giventhedataset,findthecorrespondingparameters:– mixingcoefficients– means– covariances
• Ifweknewwhichcomponentgeneratedeachdatapoint,themaximumlikelihoodsolu@onwouldinvolvefiengeachcomponenttothecorrespondingcluster
• Problem:thedatasetisunlabelled• Weshallrefertothelabelsaslatent(=hidden)variables
Synthe@cDataSetWithoutLabels
PosteriorProbabili@es
• Wecanthinkofthemixingcoefficientsaspriorprobabili@esforthecomponents
• Foragivenvalueofwecanevaluatethecorrespondingposteriorprobabili@es,calledresponsibili,es
• ThesearegivenfromBayes’theoremby
PosteriorProbabili@es(colourcoded)
Today’sOutline
• BayesianNetworks• MixtureModels• Expecta@onMaximiza@on• LatentDirichletAlloca@on
BCSSummerSchool,Exeter,2003 ChristopherM.Bishop
BCSSummerSchool,Exeter,2003 ChristopherM.Bishop
BCSSummerSchool,Exeter,2003 ChristopherM.Bishop
BCSSummerSchool,Exeter,2003 ChristopherM.Bishop
BCSSummerSchool,Exeter,2003 ChristopherM.Bishop
BCSSummerSchool,Exeter,2003 ChristopherM.Bishop
EMinGeneral• Considerarbitrarydistribu@onoverthelatentvariables(pisthetruedistribu@on)
• Thefollowingdecomposi@onalwaysholdswhere
Decomposi@on
Op@mizingtheBound
• E-step:maximizewithrespectto– equivalenttominimizingKLdivergence– setsequaltotheposteriordistribu@on
• M-step:maximizeboundwithrespectto– equivalenttomaximizingexpectedcomplete-dataloglikelihood
• EachEMcyclemustincreaseincomplete-datalikelihoodunlessalreadyata(local)maximum
E-step
M-step
Today’sOutline
• BayesianNetworks• MixtureModels• Expecta@onMaximiza@on• LatentDirichletAlloca@on
[SlidesarebasedonDavidBlei’sICML2012tutorial]
Genera@vemodelforadocumentinLDA
Genera@vemodelforadocumentinLDA
Comparisonofmixtureandadmixturemodels
UsageofLDA
EMformixturemodels
EMformixturemodels
WhatWeLearnedToday
• BayesianNetworks• MixtureModels• Expecta@onMaximiza@on• LatentDirichletAlloca@on
Homework
• ReadingMurphy11.1-11.2,11.4.1-11.4.4,27.1-27.3
• MoreaboutEM– hhp://cs229.stanford.edu/notes/cs229-notes7b.pdf– hhp://cs229.stanford.edu/notes/cs229-notes8.pdf
• MoreaboutLDA– hhp://menome.com/wp/wp-content/uploads/2014/12/Blei2011.pdf
– hhp://obphio.us/pdfs/lda_tutorial.pdf