CS 6140: Machine Learning · What we learned last @me • Sequen@al labeling models – Hidden Markov Models – Maximum-entropy Markov model – Condi@onal Random Fields

CS6140:MachineLearningSpring2017

Instructor:LuWangCollegeofComputerandInforma@onScience

NortheasternUniversityWebpage:www.ccs.neu.edu/home/luwang

Email:[email protected]

Logis@cs

•  Assignment3isdueon3/30.

•  4/13:courseprojectpresenta@on.

•  4/20:finalexam.

Whatwelearnedlast@me

•  Sequen@allabelingmodels– HiddenMarkovModels– Maximum-entropyMarkovmodel– Condi@onalRandomFields

Sample Markov Model for POS

0.95 0.9

0.05 stop

0.5

0.1

0.8

0.1

0.1

0.25

0.25

start 0.1

0.5 0.4

Det Noun

PropNoun

Verb

TheMarkovAssump@on

HiddenMarkovModels(HMMs)

Words Part-of-Speechtags

Formally

Viterbi Backtrace

s1 s2

sN

• • •

• • •

s0 sF • • •

• • •

• • •

• • • • • •

• • •

• • •

t1 t2 t3 tT-1 tT

Most likely Sequence: s0 sN s1 s2 …s2 sF

Log-LinearModels

UsingLog-LinearModels

Condi@onalRandomFields(CRFs)

Today’sOutline

•  BayesianNetworks•  MixtureModels•  Expecta@onMaximiza@on•  LatentDirichletAlloca@on

[SomeslidesareborrowedfromChristopherBishopandDavidSontag]

Today’sOutline


K-meansAlgorithm•  Goal:representadatasetintermsofKclusterseachofwhichissummarizedbyaprototype(mean)

•  Ini@alizeprototypes,theniteratebetweentwophases:– Step1:assigneachdatapointtonearestprototype

– Step2:updateprototypestobetheclustermeans•  SimplestversionisbasedonEuclideandistance

BCSSummerSchool,Exeter,2003 ChristopherM.Bishop









TheGaussianDistribu@on•  Mul@variateGaussian

mean covariance

GaussianMixtures•  Linearsuper-posi@onofGaussians

•  Normaliza@onandposi@vityrequire

•  Caninterpretthemixingcoefficientsaspriorprobabili@es

Example:Mixtureof3Gaussians

ContoursofProbabilityDistribu@on

SamplingfromtheGaussian

•  Togenerateadatapoint:– firstpickoneofthecomponentswithprobability–  thendrawasamplefromthatcomponent

•  Repeatthesetwostepsforeachnewdatapoint

Synthe@cDataSet

Synthe@cDataSetWithoutLabels

FiengtheGaussianMixture

•  Wewishtoinvertthisprocess–giventhedataset,findthecorrespondingparameters:– mixingcoefficients– means– Covariances

FiengtheGaussianMixture

•  Wewishtoinvertthisprocess–giventhedataset,findthecorrespondingparameters:–  mixingcoefficients–  means–  covariances

•  Ifweknewwhichcomponentgeneratedeachdatapoint,themaximumlikelihoodsolu@onwouldinvolvefiengeachcomponenttothecorrespondingcluster

•  Problem:thedatasetisunlabelled•  Weshallrefertothelabelsaslatent(=hidden)variables

Synthe@cDataSetWithoutLabels

PosteriorProbabili@es

•  Wecanthinkofthemixingcoefficientsaspriorprobabili@esforthecomponents

•  Foragivenvalueofwecanevaluatethecorrespondingposteriorprobabili@es,calledresponsibili,es

•  ThesearegivenfromBayes’theoremby

PosteriorProbabili@es(colourcoded)

Today’sOutline








EMinGeneral•  Considerarbitrarydistribu@onoverthelatentvariables(pisthetruedistribu@on)

•  Thefollowingdecomposi@onalwaysholdswhere

Decomposi@on

Op@mizingtheBound

•  E-step:maximizewithrespectto– equivalenttominimizingKLdivergence– setsequaltotheposteriordistribu@on

•  M-step:maximizeboundwithrespectto– equivalenttomaximizingexpectedcomplete-dataloglikelihood

•  EachEMcyclemustincreaseincomplete-datalikelihoodunlessalreadyata(local)maximum

E-step

M-step

Today’sOutline


[SlidesarebasedonDavidBlei’sICML2012tutorial]

Genera@vemodelforadocumentinLDA

Genera@vemodelforadocumentinLDA

Comparisonofmixtureandadmixturemodels

UsageofLDA

EMformixturemodels

EMformixturemodels

WhatWeLearnedToday


Homework

•  ReadingMurphy11.1-11.2,11.4.1-11.4.4,27.1-27.3

•  MoreaboutEM–  hhp://cs229.stanford.edu/notes/cs229-notes7b.pdf–  hhp://cs229.stanford.edu/notes/cs229-notes8.pdf

•  MoreaboutLDA–  hhp://menome.com/wp/wp-content/uploads/2014/12/Blei2011.pdf

–  hhp://obphio.us/pdfs/lda_tutorial.pdf

Documents

CS 6140: Machine Learning · What we learned last @me • Sequen@al labeling models – Hidden Markov Models – Maximum-entropy Markov model – Condi@onal Random Fields