104
Lecture 2 Monte Carlo Methods Dahua Lin The Chinese University of Hong Kong 1

MLPI Lecture 2: Monte Carlo Methods (Basics)

Embed Size (px)

Citation preview

Page 1: MLPI Lecture 2: Monte Carlo Methods (Basics)

Lecture'2

Monte&Carlo&MethodsDahua%Lin

The$Chinese$University$of$Hong$Kong

1

Page 2: MLPI Lecture 2: Monte Carlo Methods (Basics)

Monte&Carlo&Methods

Monte&Carlo&methods!are!a!large!family!of!computa0onal!algorithms!that!rely!on!random&sampling.!These!methods!are!mainly!used!for

• Numerical+integra/on

• Stochas/c+op/miza/on

• Characterizing+distribu/ons

2

Page 3: MLPI Lecture 2: Monte Carlo Methods (Basics)

Expecta(ons,in,Sta(s(cal,Analysis

Compu&ng)expecta'on)is)perhaps)the)most)common)opera&on)in)sta&s&cal)analysis.

• Compu'ng*the*normaliza)on*factor*in*posterior*distribu'on:

3

Page 4: MLPI Lecture 2: Monte Carlo Methods (Basics)

Expecta(ons,in,Sta(s(cal,Analysis,(cont'd)

• Compu'ng*marginaliza'on:

• Compu'ng*expecta'on*of*func'ons:

4

Page 5: MLPI Lecture 2: Monte Carlo Methods (Basics)

Compu&ng)Expecta&on• Generally,*expecta'on*can*be*wri/en*as*

• For*discrete*space:*

• For*con7nuous*space:*

5

Page 6: MLPI Lecture 2: Monte Carlo Methods (Basics)

Law$of$Large$Numbers• (Strong))Law)of)Large)Numbers)(LLN):#Let#

#be#i.i.d#random#variables#and# #be#a#measurable#func6on.#Then

6

Page 7: MLPI Lecture 2: Monte Carlo Methods (Basics)

Monte&Carlo&Integra-on• We$can$use$sample'mean$to$approximate$expecta.on:

• How%many%samples%are%enough?

7

Page 8: MLPI Lecture 2: Monte Carlo Methods (Basics)

Variance(of(Sample(Mean• By$the$Central(Limit(Theorem((CLT):

Here,$ $is$the$variance$of$ .

• The$variance$of$ $is$ .$The$number$of$

samples$required$to$a=ain$a$certain$variance$ $is$at$least$ .

8

Page 9: MLPI Lecture 2: Monte Carlo Methods (Basics)

Random'Number'Genera.on• All$sampling$methods$rely$on$a$stream$of$random'numbers$to$construct$random$samples.

• "True"$random$numbers$are$difficult$to$obtain.$A$more$widely$used$approach$is$to$use$computa:onal$algorithms$to$produce$long$sequences$of$apparently'random$numbers,$called$pseudorandom'numbers.

9

Page 10: MLPI Lecture 2: Monte Carlo Methods (Basics)

Random'Number'Genera.on'(cont'd)• The%sequence%of%pseudorandom+numbers%is%determined%by%a%seed.

• If%a%randomized%simula9on%is%based%on%a%single%random+stream,%it%can%be%exactly%reproduced%by%fixing%the%seed%ini9ally.

10

Page 11: MLPI Lecture 2: Monte Carlo Methods (Basics)

RNGs• Linear(Congruen-al(Generator((LCG):(

.

• C(and(Java's(buil-n.

• Useful(for(simple(randomized(program.

• Not(good(enough(for(serious(Monte(Carlo(simula-on.

11

Page 12: MLPI Lecture 2: Monte Carlo Methods (Basics)

RNGs

• Mersenne'Twister'(MT):'

• Passes'Die$hard'tests,'good'enough'for'most'Monte'Carlo'experiments

• Provided'by'C++'11'or'Boost

• Default'RNG'for'MATLAB,'Numpy,'Julia,'and'many'other'numerical'soLwares

• Not'amenable'to'parallel'use

12

Page 13: MLPI Lecture 2: Monte Carlo Methods (Basics)

RNGs

• Xorshi(1024:.

• Proposed.in.year.2014

• Passes.BigCrush

• Incredibly.simple.(5*+*6.lines.of.C.code)

13

Page 14: MLPI Lecture 2: Monte Carlo Methods (Basics)

Sampling)a)Discrete)Distribu2on

• Linear(search:(

• Sorted(search:(

• but(much(faster(when(prob.(mass(concentrates(on(a(few(values

• Binary(search:( ,(

• but(each(step(is(a(bit(more(expensive

• Can(we(do(be?er?14

Page 15: MLPI Lecture 2: Monte Carlo Methods (Basics)

Sampling)a)Discrete)Distribu2on• Huffman(coding:(

• (preprocessing(+( (per(sample

• Alias(methods((by(A.#J.#Walker):(

• (preprocessing(+( (per(sample

15

Page 16: MLPI Lecture 2: Monte Carlo Methods (Basics)

Transform)Sampling

• Let% %be%the%cumula&ve)distribu&on)func&on)(cdf)%of%a%distribu0on% .%Let% ,%then%

.

• Why% ?

• How%to%generate%a%exponen&ally)distributed%sample%?

16

Page 17: MLPI Lecture 2: Monte Carlo Methods (Basics)

Sampling)from)Mul/variate)Normal

How$to$draw$a$sample$from$a$mul$variate+normal+distribu$on$ $?

(Algorithm)

1. Perform)Cholesky)decomposi4on)to)get) )s.t.).

2. Generate)a)vector) )comprised)of)iid)values)from)

3. Let) .17

Page 18: MLPI Lecture 2: Monte Carlo Methods (Basics)

Rejec%on(Sampling

• (Rejec&on)sampling)algorithm):"To"sample"from"a"distribu2on" ,"which"has" "for"some" :

1. Sample"

2. Accept" "with"probability" .

• Why"does"this"algorithm"yield"the"desired"distribu2on"?

18

Page 19: MLPI Lecture 2: Monte Carlo Methods (Basics)

Rejec%on(Sampling((cont'd)

• What&are&the&acceptance&rate?

• What&are&the&problems&of&this&method?19

Page 20: MLPI Lecture 2: Monte Carlo Methods (Basics)

Importance+Sampling

• Basic&idea:"generate"samples"from"an"easier&distribu+on" ,"which"is"o4en"referred"to"as"the"proposal&distribu+on,"and"then"reweight"the"samples.

20

Page 21: MLPI Lecture 2: Monte Carlo Methods (Basics)

Importance+Sampling+(cont'd)

• Let% %be%the%target&distribu,on%and% %be%the%proposal&distribu,on%with% ,%and%let%the%importance&weight%be% .%Then

• We$can$approximate$ $with

21

Page 22: MLPI Lecture 2: Monte Carlo Methods (Basics)

Importance+Sampling+(cont'd)• By$the$strong'law'of'large'numbers,$we$have$

• How%to%choose%a%good%proposal%distribu4on% ?

22

Page 23: MLPI Lecture 2: Monte Carlo Methods (Basics)

Variance(of(Importance(Sampling

• The%variance%of% %is

• The%2nd%term%does%not%depend%on% ,%while%the%1st%term%has

23

Page 24: MLPI Lecture 2: Monte Carlo Methods (Basics)

Op#mal'Proposal• The%lower%bound%is%a1ained%when:

• The%op#mal'proposal'distribu#on%is%generally%difficult%to%sample%from.%

• However,%this%analysis%leads%us%to%an%insight:%we'can'achieve'high'sampling'efficiency'by'emphasizing'regions'where'the'value'of' 'is'high.

24

Page 25: MLPI Lecture 2: Monte Carlo Methods (Basics)

Adap%ve(Importance(Sampling• Basic'idea'**'Learn'to'do'sampling

• choose'the'proposal'from'a'tractable'family:'.

• Objec;ve:'minimize'the'sample'mean'of'.'Update'the'parameter' 'as:

25

Page 26: MLPI Lecture 2: Monte Carlo Methods (Basics)

Self%Normalized.Weights

• In$many$prac+cal$cases,$ $is$known$only$upto$a$normalizing$constant.

• For$such$case,$we$may$approximate$ $with

• Here,& &is&called&the&self%normalized.weight.

26

Page 27: MLPI Lecture 2: Monte Carlo Methods (Basics)

Self%Normalized.Weights.(cont'd)

Note:

By#strong#law#of#large#numbers,#we#have#

27

Page 28: MLPI Lecture 2: Monte Carlo Methods (Basics)

MCMC:$Mo&va&on

Simple'strategies'like'transform)sampling,'rejec1on)sampling,'and'importance)sampling'all'rely'on'drawing'independent)samples'from' 'or'a'proposal'distribu7on''over'the'sample'space.

This%can%become%very%difficult%(if%not%impossible)%for%complex%distribu:ons.

28

Page 29: MLPI Lecture 2: Monte Carlo Methods (Basics)

MCMC:$Overview

• Markov'Chain'Monte'Carlo'(MCMC)"explores"the"sample"space"through"an"ergodic"Markov"chain,"whose"equilibrium"distribu;on"is"the"target"distribu;on" .

• Many"important"samplers"belong"to"MCMC,"e.g."Gibbs,sampling,"Metropolis4Has6ngs,algorithm,"and"Slice,sampling.

29

Page 30: MLPI Lecture 2: Monte Carlo Methods (Basics)

Markov'Processes• A#Markov'process#is#a#stochas+c#process#where#the#future#depends#only#on#the#present#and#not#on#the#past.

• A#sequence#of#random#variables# #defined#on#a#measurable#space# #is#called#a#(discrete01me)'Markov'process#if#it#sa+sfies#the#Markov'property:

30

Page 31: MLPI Lecture 2: Monte Carlo Methods (Basics)

Countable*Markov*Chain

We#first#review#the#formula2on#and#proper2es#of#Markov'chains#under#a#simple#se6ng,#where# #is#a#countable#space.#We#will#later#extend#the#analysis#to#more#general#spaces.

31

Page 32: MLPI Lecture 2: Monte Carlo Methods (Basics)

Homogeneous)Markov)Chains

A"homogeneous)Markov)chain"on"a"countable"space" ,"denoted"by" "is"characterized"by"an"ini5al"distribu5on" "and"a"transi2on)probability)matrix)(TPM),"denoted"by" ,"such"that

• ,#and

• .

32

Page 33: MLPI Lecture 2: Monte Carlo Methods (Basics)

Stochas(c)Matrix

The$transi+on$probability$matrix$ $is$a$stochas'c(matrix,$namely$it$is$nonnega've$and$has:$

."

33

Page 34: MLPI Lecture 2: Monte Carlo Methods (Basics)

Evolu&on(of(State(Distribu&ons

Let$ $be$the$distribu,on$of$ ,$then:

We#can#simply#write# .

34

Page 35: MLPI Lecture 2: Monte Carlo Methods (Basics)

Mul$%step*Transi$on*Probabili$es• Consider*two*transi.on*steps:

• More&generally,& .

• Let& &be&the&distribu6on&of& ,&then&

35

Page 36: MLPI Lecture 2: Monte Carlo Methods (Basics)

Classes&of&States

• A#state# #is#said#to#be#accessible#from#state# ,#or# #leads)to# ,#denoted#by# ,#if# #for#some# .

• States# #and# #are#said#to#communicate#with#each#other,#denoted#by# ,#if# #and# .

• #is#an#equivalence)rela2on#on# ,#which#par88ons# #into#communica2ng)classes,#where#states#within#the#same#class#communicate#with#each#other.

36

Page 37: MLPI Lecture 2: Monte Carlo Methods (Basics)

Irreducibility

A"Markov"chain"is"said"to"be"irreducible"if"it"forms"a"single"communica7ng"class,"or"in"other"words,"all"states"communicate"with"any"other"states.

37

Page 38: MLPI Lecture 2: Monte Carlo Methods (Basics)

Exercise(1

• Is$this$Markov$chain$irreducible$?

• Please$iden5fy$the$communica-ng/classes.38

Page 39: MLPI Lecture 2: Monte Carlo Methods (Basics)

Periods(of(Markov(Chains• The%period%of%a%state% %is%defined%as

• A#state# #is#said#to#be#aperiodic#if# .

• Period#is#a#class#property:#if# ,#then#.

39

Page 40: MLPI Lecture 2: Monte Carlo Methods (Basics)

Aperiodic)Chains• A#Markov#chain#is#called#aperiodic,#if#all#states#are#aperiodic.

• An#irreducible,Markov,chain#is#aperiodic,#if#there#exists#an#aperiodic#state.

• Lazyness#breaks#periodicity:# .

40

Page 41: MLPI Lecture 2: Monte Carlo Methods (Basics)

First&Return&Time• Suppose(the(chain(is(ini/ally(at(state( ,(the(first%return%)me(to(state( (is(defined(to(be

Note(that( (is(a(random(variable.

• We(also(define( ,(the(probability(that(the(chain(returns(to( (for%the%first%)me(a<er( (steps.

41

Page 42: MLPI Lecture 2: Monte Carlo Methods (Basics)

Recurrence

A"state" "is"said"to"be"recurrent"if"it"is"guaranteed"to"have"a"finite)hi+ng)-me,"as

Otherwise,* *is*said*to*be*transient.

42

Page 43: MLPI Lecture 2: Monte Carlo Methods (Basics)

Recurrence'(cont'd)• "is"recurrent"iff"it"returns"to" "infinitely+o-en:

• Recurrence"is"a"class"property:"if" "and" "is"recurrent,"then" "is"also"recurrent.

• Every"finite"communica9ng"class"is"recurrent.

• An"irreducible"finite"Markov"chain"is"recurrent.43

Page 44: MLPI Lecture 2: Monte Carlo Methods (Basics)

Invariant(Distribu-ons

Consider)a)Markov)chain)with)TPM) )on) :

• A#distribu+on# #over# #is#called#an#invariant'distribu,on#(or#sta,onary'distribu,on)#if# .

• Invariant'distribu,on#is#NOT#necessarily#existent#and#unique.

• Under#certain#condi+on#(ergodicity),#there#exists#a#unique#invariant#distribu+on# .#In#such#cases,# #is#oAen#called#an#equilibrium'distribu,on.

44

Page 45: MLPI Lecture 2: Monte Carlo Methods (Basics)

Exercise(2

• Is$this$chain$irreducible?

• Is$this$chain$periodic?

• Please$compute$the$invariant/distribu1on.45

Page 46: MLPI Lecture 2: Monte Carlo Methods (Basics)

Posi%ve(Recurrence• The%expected'return'+me%of%a%state% %is%defined%to%be% .

• When% %is%transient,% .%If% %is%recurrent,% %is%NOT%necessarily%finite.

• A%recurrent'state% %is%called%posi+ve'recurrent%if%.%Otherwise,%it%is%called%null'recurrent.

46

Page 47: MLPI Lecture 2: Monte Carlo Methods (Basics)

Existence)of)Invariant)Distribu3ons

For$an$irreducible$Markov$chain,$if$some$state$is$posi,ve.recurrent,$then$all$states$are$posi,ve.recurrent$and$the$chain$has$an$invariance.distribu,on$ $given$by$

47

Page 48: MLPI Lecture 2: Monte Carlo Methods (Basics)

Example:)1D)Random)Walk

• Under'what'condi/on'is'this'chain'recurrent?

• When'it'is'recurrent,'is'it'posi+ve-recurrent'or'null-recurrent?

48

Page 49: MLPI Lecture 2: Monte Carlo Methods (Basics)

Ergodic(Markov(Chains• An$irreducible,$aperiodic,$and$posi-ve/recurrent$Markov$chain$is$called$an$ergodic/Markov/chain,$or$simply$ergodic/chain.

• A$finite$Markov$chain$is$ergodic$if$and$only$if$it$is$irreducible$and$aperiodic.

• A$Markov$chain$is$ergodic$if$it$is$aperiodic$and$there$exist$ $such$that$any$state$can$be$reached$from$any$other$state$within$ $steps$with$posi>ve$probability.

49

Page 50: MLPI Lecture 2: Monte Carlo Methods (Basics)

Convergence)to)Equilibrium

Let$ $be$the$transi,on$probability$matrix$of$an$ergodic$Markov$chain,$then$there$exists$a$unique$invariant,distribu0on$ .$Then$with$any$ini,al$distribu,on,$

!

Par$cularly,* *as* *for*all*.

50

Page 51: MLPI Lecture 2: Monte Carlo Methods (Basics)

Ergodic(Theorem• The%ergodic(theorem%relates%,me(mean%to%space(mean:

• Let% %be%an%ergodic(Markov(chain%over%%with%equilibrium%distribu7on% ,%and% %be%a%

measurable%func7on%on% ,%then

51

Page 52: MLPI Lecture 2: Monte Carlo Methods (Basics)

Ergodic(Theorem((cont'd)• More&generally,&we&have&for&any&posi4ve&integer& :

• The%ergodic(theorem%is%the%theore+cal%founda+on%for%MCMC.

52

Page 53: MLPI Lecture 2: Monte Carlo Methods (Basics)

Total&Varia*on&Distance• The%total%varia)on%distance%between%two%measures%

%and% %is%defined%as

• The%total%varia)on%distance%is%a%metric.%If% %is%countable,%we%have

53

Page 54: MLPI Lecture 2: Monte Carlo Methods (Basics)

Mixing&Time

The$%me$required$by$a$Markov$chain$to$get$close$to$the$equilibrium$distribu%on$is$measured$by$the$mixing&'me,$defined$as$

In#par'cular# .

54

Page 55: MLPI Lecture 2: Monte Carlo Methods (Basics)

Spectral)Representa-on

• "be"a"finite"stochas'c(matrix"over" "with" .

• The"spectral(radius"of" ,"namely"the"maximum"absolute"value"of"all"eigenvalues,"is" .

• Assume" "is"ergodic"and"reversible"with"equilibrium"distribu=on" .

• Define"an"inner"product:"

55

Page 56: MLPI Lecture 2: Monte Carlo Methods (Basics)

Spectral)Representa-on)(cont'd)

All#eigenvalues#of# #are#real#values,#given#by#.#Let# #be#the#right#

eigenvector#associated#with# .#Then#the#le:#eigenvector#is# #(element'wise+product),#and# #can#be#represented#as

56

Page 57: MLPI Lecture 2: Monte Carlo Methods (Basics)

Spectral)Gap

Let$ .$Then$the$spectral)gap$is$defined$to$be$ $and$the$absolute)spectral)gap$is$defined$to$be$ .$Then:

Here,% .

57

Page 58: MLPI Lecture 2: Monte Carlo Methods (Basics)

Bounds'of'Mixing'Time

The$mixing&'me$can$be$bounded$by$the$inverse$of$absolute&spectral&gap:

Generally,)the)goal)to)design)a)rapidly(mixing)reversible)Markov)chain)is)to)maximize)the)absolute)spectral)gap) .

58

Page 59: MLPI Lecture 2: Monte Carlo Methods (Basics)

Ergodic(Flow

Consider)an)ergodic)Markov)chain)on)a)finite)space) )with)transi5on)probability)matrix) )and)equilibrium)distribu5on) :

• The%ergodic(flow%from%a%subset% %to%another%subset%%is%defined%as

59

Page 60: MLPI Lecture 2: Monte Carlo Methods (Basics)

Conductance

The$conductance$of$a$Markov$chain$is$defined$as

60

Page 61: MLPI Lecture 2: Monte Carlo Methods (Basics)

Bounds'of'Spectral'Gap

(Jerrum'and'Sinclair'(1989))!The!spectral!gap!is!bounded!by

61

Page 62: MLPI Lecture 2: Monte Carlo Methods (Basics)

Exercise(3

Consider)an)ergodic)finite)chain) )with) .)To)improve)the)mixing)7me,)one)can)add)a)li:le)bit)lazyness)as) .)

Please&solve&the&op,mal&value&of& &that&maximizes&the&absolute)spectral)gap& .

62

Page 63: MLPI Lecture 2: Monte Carlo Methods (Basics)

Exercise(4

Consider)a) )stochas.c)matrix) ,)given)by))when) .

• Please'specify'the'condi2on'under'which' 'is'ergodic.

• What'is'the'equilibrium'distribu2on'when' 'is'ergodic?

• Solve'the'op2mal'value'of' 'that'maximizes'the'absolute'spectral'gap.

63

Page 64: MLPI Lecture 2: Monte Carlo Methods (Basics)

General'Markov'Chains

Next,&we&extend&the&formula2on&of&Markov'chain&from&countable'space&to&general&measurable'space.

• First,(the(Markov'property(remains.

• But,&the&transi.on&probability&matrix&makes&no&sense&in&general.

64

Page 65: MLPI Lecture 2: Monte Carlo Methods (Basics)

General'Markov'Chains'(cont'd)• Generally,*a*homogeneous)Markov)chain,*denoted*by*

,*over*a*measurable*space* *is*characterized*by*

• an*ini1al)measure* ,*and*

• a*transi1on)probability)kernel* :

65

Page 66: MLPI Lecture 2: Monte Carlo Methods (Basics)

Stochas(c)Kernel

The$transi'on)probability)kernel$ $is$a$stochas'c)kernel:

• Given' ,' 'is'a'probability*measure'over' .

• Given'a'measurable'subset' ,''is'a'measurable'func5on.

• When' 'is'a'countable'space,' 'reduces'to'a'stochas1c*matrix.

66

Page 67: MLPI Lecture 2: Monte Carlo Methods (Basics)

Evolu&on(of(State(Distribu&ons

Suppose'the'distribu.on'of' 'is' ,'then

Again,'we'can'simply'write'this'as' .

67

Page 68: MLPI Lecture 2: Monte Carlo Methods (Basics)

Composi'on)of)Stochas'c)Kernels

• Composi(on*of*stochas(c*kernels* *and* *remains*a*stochas(c*kernel,*denoted*by* ,*defined*as:

• Recursive*composi.on*of* *for* *.mes*results*in*a*stochas.c*kernel*denoted*by* ,*and*we*have*

*and* .

68

Page 69: MLPI Lecture 2: Monte Carlo Methods (Basics)

Example:)Random)Walk)in)

Here,%the%stochas,c%kernel%is%given%by% .

69

Page 70: MLPI Lecture 2: Monte Carlo Methods (Basics)

Occupa&on)Time,)Return)Time,)and)Hi4ng)Time

Let$ $be$a$measurable$set:

• The%occupa&on(&me:% .

• The%return(&me:% .

• The%hi/ng(&me:% .

• ,% %and% %are%all%random%variables.70

Page 71: MLPI Lecture 2: Monte Carlo Methods (Basics)

!irreducibility

• Define&.

• Given&a&posi/ve&measure& &over& ,&a&markov&chain&is&called& !irreducible&if& &whenever& &is& !posi-ve,&i.e.& .

• Intui/vely,&it&means&that&for&any& >posi/ve&set& ,&there&is&posi/ve&chance&that&the&chain&enters& &within&finite&/me,&no&ma?er&where&it&begins.

71

Page 72: MLPI Lecture 2: Monte Carlo Methods (Basics)

!irreducibility,(cont'd)

• A#Markov#chain#over# #is# 0irreducible#if#and#only#if#either#of#the#following#statement#holds:

72

Page 73: MLPI Lecture 2: Monte Carlo Methods (Basics)

!irreducibility,(cont'd)

• Typical)spaces)usually)come)with)natural'measure) :

• The)natural'measure)for)countable)space)is)the)coun-ng'measure.)In)this)case,)the)no:on)of) ;irreducibility)coincides)with)the)one)introduced)earlier.

• The)natural'measure)for) ,) ,)or)a)finite;dimensional)manifold)is)the)Lebesgue'measure

73

Page 74: MLPI Lecture 2: Monte Carlo Methods (Basics)

Transience)and)Recurrence

Given&a&Markov&chain& &over& ,&and& :

• "is"called"transient"if" "for"every" .

• "is"called"uniformly.transient"if"there"exists" "such"that" "for"every" .

• "is"called"recurrent"if" "for"every" .

74

Page 75: MLPI Lecture 2: Monte Carlo Methods (Basics)

Transience)and)Recurrence)(cont'd)

• Consider*an* ,irreducible*chain,*then*either:

• Every* ,posi9ve*subset*is*recurrent,*then*we*call*the*chain*recurrent

• *is*covered*by*countably*many*uniformly*transient*sets,*then*we*call*the*chain*transient.

75

Page 76: MLPI Lecture 2: Monte Carlo Methods (Basics)

Invariant(Measures• A#measure# #is#called#an#invariant'measure#w.r.t.#the#stochas2c#kernel# #if# ,#i.e.

• A#recurrent#Markov#chain#admits#a#unique#invariant'measure# #(up#to#a#scale#constant).

• Note:#This#measure# #can#be#finite#or#infinite.

76

Page 77: MLPI Lecture 2: Monte Carlo Methods (Basics)

Posi%ve(Chains• A#Markov#chain#is#called#posi%ve#if#it#is#irreducible#and#admits#an#invariant,probability,measure# .

• The#study#of#the#existence#of# #requires#more#sophis<cated#analysis.

• We#are#not#going#into#these#details,#as#in#MCMC#prac<ce,#existence#of# #is#usually#not#an#issue.

77

Page 78: MLPI Lecture 2: Monte Carlo Methods (Basics)

Subsampled+Chains

Let$ $be$a$stochas+c$kernel$and$ $be$a$probability$vector$over$ .$Then$ $defined$as$below$is$also$a$stochas+c$kernel:

The$chain$with$kernel$ $is$called$a$subsampled*chain$with$ .

78

Page 79: MLPI Lecture 2: Monte Carlo Methods (Basics)

Subsampled+Chains+(cont'd)

• When& ,& .

• If& &is&invariant&w.r.t.& ,&then& &is&also&invariant&w.r.t.&.

79

Page 80: MLPI Lecture 2: Monte Carlo Methods (Basics)

Sta$onary)Process

• A#stochas*c#process# #is#called#sta$onary#if

• A#Markov#chain#is#sta$onary#if#it#has#an#invariant#probability#measure#and#that#is#also#its#ini9al#distribu9on.

80

Page 81: MLPI Lecture 2: Monte Carlo Methods (Basics)

Birkhoff(Ergodic(Theorem• (Birkhoff)Ergodic)Theorem)"Every"irreducible)sta8onary"Markov"chain" "is"ergodic,"that"is,"for"any"real5valued"measurable"func:on" :

where" "is"the"invariant"probability"measure.

81

Page 82: MLPI Lecture 2: Monte Carlo Methods (Basics)

Markov'Chain'Monte'Carlo

(Markov(Chain(Monte(Carlo):"To"sample"from"a"target(distribu6on" :

• We$first$construct$a$Markov$chain$with$transi'on)probability)kernel$ $such$that$ .

• This$is$the$most$difficult$part.

82

Page 83: MLPI Lecture 2: Monte Carlo Methods (Basics)

Markov'Chain'Monte'Carlo'(cont'd)• Then&we&simulate&the&chain,&usually&in&two&stages:

• (Burning(stage)&simulate&the&chain&and&ignore&all&samples,&un8l&it&gets&close&to&sta.onary

• (Sampling(stage)&collect&samples& &from&a&subsampled&chain& &or& .

• Approximate&the&expecta8on&of&the&func8on& &of&interest&using&the&sample&mean.

83

Page 84: MLPI Lecture 2: Monte Carlo Methods (Basics)

Detailed(Balance

Most%Markov%chains%in%MCMC%prac0ce%falls%in%a%special%family:%reversible(chains

• A#distribu+on# #over#a#countable*space#is#said#to#be#in*detailed*balance#with# #if#

.

• Detailed(balance"implies"invariance.

• The"converse"is"not"true.

84

Page 85: MLPI Lecture 2: Monte Carlo Methods (Basics)

Reversible)Chains

An#irreducible#Markov#chain# #with#transi5on#probability#matrix# #and#an#invariant#distribu5on# :

• This&Markov&chain&is&called&reversible&if& &is&in&detailed+balance&with& .

• Under&this&condi7on,&it&has:

85

Page 86: MLPI Lecture 2: Monte Carlo Methods (Basics)

Reversible)Chains)on)General)Spaces

Over%a%general%measurable%space% ,%a%stochas4c%kernel% %is%called%reversible%w.r.t.%a%probability%measure%%if

for$any$bounded$measurable$func0on$ .

• If$ $is$reversible$w.r.t.$ ,$then$ $is$an$invariant$to$ .

86

Page 87: MLPI Lecture 2: Monte Carlo Methods (Basics)

Detailed(Balance(on(General(Spaces

Suppose'both' 'and' 'are'absolutely'con2nuous'w.r.t.'a'base'measure' :'

Then%the%chain%is%reversible%if%and%only%if

which%is%called%the%detailed'balance.

87

Page 88: MLPI Lecture 2: Monte Carlo Methods (Basics)

Detailed(Balance(on(General(Spaces

More%generally,%if%

where% ,%then%the%chain%is%reversible%iff

88

Page 89: MLPI Lecture 2: Monte Carlo Methods (Basics)

Metropolis*Has-ngs:1Overview• In$MCMC$prac+ce,$the$target&distribu,on$ $is$usually$known$up$to$an$unnormalized&density$ ,$such$that$ ,$and$the$normalizing&constant$ $is$o9en$intractable$to$compute.

• The$Metropolis6Has,ngs&algorithm&(M6H&algorithm)$is$a$classical$and$popular$approach$to$MCMC$sampling,$which$requires$only$the$unnormalized&density$ .$

89

Page 90: MLPI Lecture 2: Monte Carlo Methods (Basics)

Metropolis*Has-ngs:1How1it1Works

1. It%is%associated%with%a%proposal'kernel% .

2. At%each%itera2on,%a%candidate%is%generated%from% %given%the%current%state% .

3. With%a%certain%acceptance%ra2o,%which%depends%on%both% %and% ,%the%candidate%is%accepted.

• The%acceptance%ra2o%is%determined%in%a%way%that%maintains%detailed'balance,%so%the%resultant%chain%is%reversible%w.r.t.% .

90

Page 91: MLPI Lecture 2: Monte Carlo Methods (Basics)

Metropolis*Algorithm• The%Metropolis*algorithm%is%a%precursor%(and%a%special%case)%of%the%M6H%algorithm,%which%requires%the%designer%to%provide%a%symmetric*kernel% ,%i.e.%

,%where% %is%the%density%of% .

• Note:% %is%not%necessarily%invariant%to%

• Gaussian*random*walk%is%a%symmetric%kernel.

91

Page 92: MLPI Lecture 2: Monte Carlo Methods (Basics)

Metropolis*Algorithm*(cont'd)• At$each$itera+on,$with$current$state$ :

1. Generate$a$candidate$ $from$

2. Accept$the$candidate$with$acceptance'ra)o$.

• The$Metropolis'update$sa+sfies$detailed'balance.$Why?

92

Page 93: MLPI Lecture 2: Monte Carlo Methods (Basics)

Metropolis*Has-ngs0Algorithm• The%Metropolis*Has-ngs0algorithm%requires%a%proposal0kernel% ,

• %is%NOT%necessarily%symmetric%and%does%NOT%necessarily%admit% %as%an%invariant%measure.

93

Page 94: MLPI Lecture 2: Monte Carlo Methods (Basics)

Metropolis*Has-ngs0Algorithm• At$each$itera+on,$with$current$status$ :

• Generate$a$candidate$ $from$

• Accept$the$candidate$with$acceptance'ra)o$,$with

• The$Metropolis/Has)ngs'update$sa+sfies$detailed'balance.$Why?

94

Page 95: MLPI Lecture 2: Monte Carlo Methods (Basics)

Gibbs%Sampling• The%Gibbs%sampler%was%introduced%by%Geman%and%Geman%(1984)%from%sampling%from%a%Markov%random%field%over%images,%and%popularized%by%Gelfand%and%Smith%(1990).

• Each%state% %is%comprised%of%mulIple%components%.

95

Page 96: MLPI Lecture 2: Monte Carlo Methods (Basics)

Gibbs%Sampling%(cont'd)

At#each#itera*on,#following#a#permuta*on# #over#.#For# ,#let# ,#update# #by#

re;drawing# #condi*oned#on#all#other#components:

Here$ $indicates$the$condi&onal)distribu&on$of$ $given$all$other$components.$

96

Page 97: MLPI Lecture 2: Monte Carlo Methods (Basics)

Gibbs%Sampling%(cont'd)• At$each$itera+on,$one$can$use$either$a$random'scan$or$a$fixed'scan.

• Different$schedules$can$be$used$at$different$itera+ons$to$scan$the$components.

• The$Gibbs'update$is$a$special$case$of$M4H'update,$and$thus$sa+sfies$detailed4balance.$Why?

97

Page 98: MLPI Lecture 2: Monte Carlo Methods (Basics)

Example(1:(Gaussian(Mixture(Model

98

Page 99: MLPI Lecture 2: Monte Carlo Methods (Basics)

Example(1:(Gibbs(Sampling

Given& ,&ini(alize& &and& .

Condi&oned(on( (and( :

99

Page 100: MLPI Lecture 2: Monte Carlo Methods (Basics)

Example(1:(Gibbs(Sampling((cont'd)

Condi&oned(on( (and( :

with

100

Page 101: MLPI Lecture 2: Monte Carlo Methods (Basics)

Example(2:(Ising(Model

The$normalizing$constant$ $is$usually$unknown$and$intractable$to$compute.

101

Page 102: MLPI Lecture 2: Monte Carlo Methods (Basics)

Example(2:(Gibbs(Sampling

• Let% %denote%the%en*re% %vector%except%for%one%entry% ,

• How%can%we%schedule%the%computa2on%so%that%many%updates%can%be%done%in%parallel?

• Coloring102

Page 103: MLPI Lecture 2: Monte Carlo Methods (Basics)

Mixture(of(MCMC(Kernels

Let$ $be$stochas'c(kernels$with$the$same$invariant$probability$measure$ :

• Let% %be%a%probability%vector,%then% %

remains%a%stochas'c(kernel%with%invariant%probability%measure% .

• Furthermore,%if% %are%all%reversible,%then% %is%reversible.

103

Page 104: MLPI Lecture 2: Monte Carlo Methods (Basics)

Composi'on)of)MCMC)Kernels

• "is"also"a"stochas'c(kernel"with"invariant"probability"measure" .

• Note:" "is"generally"not"reversible"even"when""are"all"reversible,"except"when"

.

104