39
CS388: Natural Language Processing Greg Durre8 Lecture 5: CRFs

CS388: Natural Language Processinggdurrett/courses/fa2019/...CS388: Natural Language Processing Greg Durre8 Lecture 5: CRFs Administrivia ‣ Project 1 is out, sample writeups on website

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CS388: Natural Language Processinggdurrett/courses/fa2019/...CS388: Natural Language Processing Greg Durre8 Lecture 5: CRFs Administrivia ‣ Project 1 is out, sample writeups on website

CS388:NaturalLanguageProcessing

GregDurre8

Lecture5:CRFs

Page 2: CS388: Natural Language Processinggdurrett/courses/fa2019/...CS388: Natural Language Processing Greg Durre8 Lecture 5: CRFs Administrivia ‣ Project 1 is out, sample writeups on website

Administrivia

‣ Project1isout,samplewriteupsonwebsite

‣Mini1gradingunderway

Page 3: CS388: Natural Language Processinggdurrett/courses/fa2019/...CS388: Natural Language Processing Greg Durre8 Lecture 5: CRFs Administrivia ‣ Project 1 is out, sample writeups on website

Recall:HMMs

‣ Inferenceproblem:

‣ Viterbi:

‣ ObservaMonsO(=inputx) OutputQ(sequenceofstates)=labelsy

y1 y2 yn

x1 x2 xn

… P (y,x) = P (y1)nY

i=2

P (yi|yi�1)nY

i=1

P (xi|yi)

argmaxyP (y|x) = argmaxyP (y,x)

P (x)

‣ Training:maximumlikelihoodesMmaMon(withsmoothing)

scorei(s) = maxyi�1

P (s|yi�1)P (xi|s)scorei�1(yi�1)

Page 4: CS388: Natural Language Processinggdurrett/courses/fa2019/...CS388: Natural Language Processing Greg Durre8 Lecture 5: CRFs Administrivia ‣ Project 1 is out, sample writeups on website

Recall:ViterbiAlgorithm

slidecredit:RayMooney

a0:IniMalstatedistribuMonaij:Probabilityofi-jtransiMonbj(ot):ProbabilityofemiZng

symbolotfromstatej

Page 5: CS388: Natural Language Processinggdurrett/courses/fa2019/...CS388: Natural Language Processing Greg Durre8 Lecture 5: CRFs Administrivia ‣ Project 1 is out, sample writeups on website

Viterbi/HMMs:OtherResources

‣ Lecturenotesfrommyundergradcourse(postedonline)

‣ EisensteinChapter7.3butthenotaMoncoversamoregeneralcasethanwhat’sdiscussedforHMMs

‣ Jurafsky+MarMn8.4.5

Page 6: CS388: Natural Language Processinggdurrett/courses/fa2019/...CS388: Natural Language Processing Greg Durre8 Lecture 5: CRFs Administrivia ‣ Project 1 is out, sample writeups on website

ThisLecture

‣ (ifMme)Beamsearch

‣ CRFs:model(+featuresforNER),inference,learning

‣ NamedenMtyrecogniMon(NER)

Page 7: CS388: Natural Language Processinggdurrett/courses/fa2019/...CS388: Natural Language Processing Greg Durre8 Lecture 5: CRFs Administrivia ‣ Project 1 is out, sample writeups on website

NamedEnMtyRecogniMon

BarackObamawilltraveltoHangzhoutodayfortheG20mee>ng.

PERSON LOC ORG

B-PER I-PER O O O B-LOC B-ORGO O O O O

‣ BIOtagset:begin,inside,outside

‣WhymightanHMMnotdosowellhere?

‣ LotsofO’s

‣ Sequenceoftags—shouldweuseanHMM?

‣ Insufficientfeatures/capacitywithmulMnomials(especiallyforunks)

Page 8: CS388: Natural Language Processinggdurrett/courses/fa2019/...CS388: Natural Language Processing Greg Durre8 Lecture 5: CRFs Administrivia ‣ Project 1 is out, sample writeups on website

CRFs

Page 9: CS388: Natural Language Processinggdurrett/courses/fa2019/...CS388: Natural Language Processing Greg Durre8 Lecture 5: CRFs Administrivia ‣ Project 1 is out, sample writeups on website

Wherewe’regoing

‣ FlexiblediscriminaMvemodelfortaggingtasksthatcanusearbitraryfeaturesoftheinput.SimilartologisMcregression,butstructured

BarackObamawilltraveltoHangzhoutodayfortheG20mee>ng.

B-PER I-PER

Curr_word=Barack&Label=B-PERNext_word=Obama&Label=B-PERCurr_word_starts_with_capital=True&Label=B-PERPosn_in_sentence=1st&Label=B-PERLabel=B-PER&Next-Label=I-PER…

Page 10: CS388: Natural Language Processinggdurrett/courses/fa2019/...CS388: Natural Language Processing Greg Durre8 Lecture 5: CRFs Administrivia ‣ Project 1 is out, sample writeups on website

HMMs,Formally

‣ HMMsareexpressibleasBayesnets(factorgraphs)

y1 y2 yn

x1 x2 xn

‣ ThisreflectsthefollowingdecomposiMon:

‣ Locallynormalizedmodel:eachfactorisaprobabilitydistribuMonthatnormalizes

P (y,x) = P (y1)P (x1|y1)P (y2|y1)P (x2|y2) . . .

Page 11: CS388: Natural Language Processinggdurrett/courses/fa2019/...CS388: Natural Language Processing Greg Durre8 Lecture 5: CRFs Administrivia ‣ Project 1 is out, sample writeups on website

CondiMonalRandomFields

anyreal-valuedscoringfuncMonofitsarguments

‣ CRFs:discriminaMvemodelswiththefollowingglobally-normalizedform:

‣ HMMs:

P (y|x) = 1

Z

Y

k

exp(�k(x,y))

normalizer

P (y,x) = P (y1)P (x1|y1)P (y2|y1)P (x2|y2) . . .

‣ Specialcase:linearfeature-basedpotenMals �k(x,y) = w>fk(x,y)

P (y|x) = 1

Zexp

nX

k=1

w>fk(x,y)

!‣ LookslikeoursingleweightvectormulMclasslogisMcregressionmodel

Page 12: CS388: Natural Language Processinggdurrett/courses/fa2019/...CS388: Natural Language Processing Greg Durre8 Lecture 5: CRFs Administrivia ‣ Project 1 is out, sample writeups on website

HMMsvs.CRFs

‣ NaiveBayes:logisMcregression::HMMs:CRFslocalvs.globalnormalizaMon<->generaMvevs.discriminaMve

(locallynormalizeddiscriminaMvemodelsdoexist(MEMMs))

P (y|x) = 1

Zexp

nX

k=1

w>fk(x,y)

!y1 y2

x1

x2

x3

f1

f2

f3

‣ HMMs:inthestandardsetup,emissionsconsideronewordataMme

‣ CRFs:featuresovermanywordssimultaneously,non-independentfeatures(e.g.,suffixesandprefixes),doesn’thavetobeageneraMvemodel

‣ CondiMonalmodel:x’sareobserved

Page 13: CS388: Natural Language Processinggdurrett/courses/fa2019/...CS388: Natural Language Processing Greg Durre8 Lecture 5: CRFs Administrivia ‣ Project 1 is out, sample writeups on website

ProblemwithCRFs

P (y|x) = 1

Zexp

nX

k=1

w>fk(x,y)

!y1 y2

x1

x2

x3

f1

f2

f3

‣ Normalizingconstant

Z =X

y0

exp

nX

k=1

w>fk(x,y0)

!

‣ Inference:

‣ Ifyconsistsof5variableswith30valueseach,howexpensivearethese?

ybest = argmaxy0 exp

nX

k=1

w>fk(x,y0)

!

‣ NeedtoconstraintheformofourCRFstomakeittractable

Page 14: CS388: Natural Language Processinggdurrett/courses/fa2019/...CS388: Natural Language Processing Greg Durre8 Lecture 5: CRFs Administrivia ‣ Project 1 is out, sample writeups on website

SequenMalCRFs

‣ NotaMon:omitxfromthefactorgraphenMrely(implicit),buteveryfeaturefuncMonconnectstoit

y1 y2 yn…

�t

‣ Twotypesoffactors:transi>ons(lookatadjacenty’s,butnotx)andemissions(lookatyandallofx)

P (y|x) = 1

Z

nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(yi, i,x))

SequenMalCRF:(oneform)

�e

�t

�e

Page 15: CS388: Natural Language Processinggdurrett/courses/fa2019/...CS388: Natural Language Processing Greg Durre8 Lecture 5: CRFs Administrivia ‣ Project 1 is out, sample writeups on website

FeaturesforNER

Page 16: CS388: Natural Language Processinggdurrett/courses/fa2019/...CS388: Natural Language Processing Greg Durre8 Lecture 5: CRFs Administrivia ‣ Project 1 is out, sample writeups on website

FeatureFuncMons

y1 y2 yn…

�e

�t

‣ Phisareflexible(canbeNNwith1B+parameters).Here:sparselinearfcns

�t(yi�1, yi) = w>ft(yi�1, yi)

P (y|x) / expw>

"nX

i=2

ft(yi�1, yi) +nX

i=1

fe(yi, i,x)

#�e(yi, i,x) = w>fe(yi, i,x)

P (y|x) = 1

Z

nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(yi, i,x))

(lookslikeMini1features)

Page 17: CS388: Natural Language Processinggdurrett/courses/fa2019/...CS388: Natural Language Processing Greg Durre8 Lecture 5: CRFs Administrivia ‣ Project 1 is out, sample writeups on website

BasicFeaturesforNER

BarackObamawilltraveltoHangzhoutodayfortheG20mee>ng.

OB-LOC

TransiMons:

Emissions: Ind[B-LOC&Currentword=Hangzhou]Ind[B-LOC&Prevword=to]

ft(yi�1, yi) = Ind[yi�1 & yi]

fe(y6, 6,x) =

P (y|x) / expw>

"nX

i=2

ft(yi�1, yi) +nX

i=1

fe(yi, i,x)

#

=Ind[O—B-LOC]

Page 18: CS388: Natural Language Processinggdurrett/courses/fa2019/...CS388: Natural Language Processing Greg Durre8 Lecture 5: CRFs Administrivia ‣ Project 1 is out, sample writeups on website

EmissionFeaturesforNER

Leicestershireisaniceplacetovisit…

Itookavaca>ontoBoston

Applereleasedanewversion…

AccordingtotheNewYorkTimes…

ORG

ORG

LOC

LOC

TexasgovernorGregAbboJsaid

LeonardoDiCapriowonanaward…

PER

PER

LOC

�e(yi, i,x)

Page 19: CS388: Natural Language Processinggdurrett/courses/fa2019/...CS388: Natural Language Processing Greg Durre8 Lecture 5: CRFs Administrivia ‣ Project 1 is out, sample writeups on website

EmissionFeaturesforNER

‣ Contextfeatures(can’tuseinHMM!)‣Wordsbefore/arer‣ Tagsbefore/arer

‣Wordfeatures(canuseinHMM)‣ CapitalizaMon‣Wordshape‣ Prefixes/suffixes‣ Lexicalindicators

‣ Gaze8eers‣Wordclusters

Leicestershire

Boston

Applereleasedanewversion…

AccordingtotheNewYorkTimes…

Page 20: CS388: Natural Language Processinggdurrett/courses/fa2019/...CS388: Natural Language Processing Greg Durre8 Lecture 5: CRFs Administrivia ‣ Project 1 is out, sample writeups on website

CRFsOutline

‣Model: P (y|x) = 1

Z

nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(yi, i,x))

‣ Inference

‣ Learning

P (y|x) / expw>

"nX

i=2

ft(yi�1, yi) +nX

i=1

fe(yi, i,x)

#

Page 21: CS388: Natural Language Processinggdurrett/courses/fa2019/...CS388: Natural Language Processing Greg Durre8 Lecture 5: CRFs Administrivia ‣ Project 1 is out, sample writeups on website

InferenceandLearninginCRFs

Page 22: CS388: Natural Language Processinggdurrett/courses/fa2019/...CS388: Natural Language Processing Greg Durre8 Lecture 5: CRFs Administrivia ‣ Project 1 is out, sample writeups on website

CompuMng(arg)maxes

y1 y2 yn…

�e

�t

‣ :canuseViterbiexactlyasinHMMcase

‣ andplaytheroleofthePsnow,samedynamicprogramexp(�t(yi�1, yi)) exp(�e(yi, i,x))

P (y|x) = 1

Z

nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(yi, i,x))

argmaxyP (y|x)

{maxy1,...,yn

e�t(yn�1,yn)e�e(yn,n,x) · · · e�e(y2,2,x)e�t(y1,y2)e�e(y1,1,x)

= maxy2,...,yn

e�t(yn�1,yn)e�e(yn,n,x) · · · e�e(y2,2,x) maxy1

e�t(y1,y2)e�e(y1,1,x)

= maxy3,...,yn

e�t(yn�1,yn)e�e(yn,n,x) · · ·maxy2

e�t(y2,y3)e�e(y2,2,x) maxy1

e�t(y1,y2)score1(y1){

Page 23: CS388: Natural Language Processinggdurrett/courses/fa2019/...CS388: Natural Language Processing Greg Durre8 Lecture 5: CRFs Administrivia ‣ Project 1 is out, sample writeups on website

InferenceinGeneralCRFs

y1 y2 yn…

�e

�t

‣ Candoefficientinferenceinanytree-structuredCRF

‣Max-productalgorithm:generalizaMonofViterbitoarbitrarytree-structuredgraphs(sum-productisgeneralizaMonofforward-backward)

Page 24: CS388: Natural Language Processinggdurrett/courses/fa2019/...CS388: Natural Language Processing Greg Durre8 Lecture 5: CRFs Administrivia ‣ Project 1 is out, sample writeups on website

CRFsOutline

‣Model: P (y|x) = 1

Z

nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(yi, i,x))

‣ Inference:argmaxP(y|x)fromViterbi

‣ Learning

P (y|x) / expw>

"nX

i=2

ft(yi�1, yi) +nX

i=1

fe(yi, i,x)

#

Page 25: CS388: Natural Language Processinggdurrett/courses/fa2019/...CS388: Natural Language Processing Greg Durre8 Lecture 5: CRFs Administrivia ‣ Project 1 is out, sample writeups on website

TrainingCRFs

‣ GradientiscompletelyanalogoustologisMcregression:

P (y|x) / expw>

"nX

i=2

ft(yi�1, yi) +nX

i=1

fe(yi, i,x)

#

P (y|x) / expw>f(x, y)‣ LogisMcregression:

‣Maximize L(y⇤,x) = logP (y⇤|x)

intractable!

@

@wL(y⇤,x) =

nX

i=2

ft(y⇤i�1, y

⇤i ) +

nX

i=1

fe(y⇤i , i,x)

�Ey

"nX

i=2

ft(yi�1, yi) +nX

i=1

fe(yi, i,x)

#

Page 26: CS388: Natural Language Processinggdurrett/courses/fa2019/...CS388: Natural Language Processing Greg Durre8 Lecture 5: CRFs Administrivia ‣ Project 1 is out, sample writeups on website

TrainingCRFs

‣ Let’sfocusonemissionfeatureexpectaMon

@

@wL(y⇤,x) =

nX

i=2

ft(y⇤i�1, y

⇤i ) +

nX

i=1

fe(y⇤i , i,x)

�Ey

"nX

i=2

ft(yi�1, yi) +nX

i=1

fe(yi, i,x)

#

Ey

"nX

i=1

fe(yi, i,x)

#=

X

y2YP (y|x)

"nX

i=1

fe(yi, i,x)

#=

nX

i=1

X

y2YP (y|x)fe(yi, i,x)

=nX

i=1

X

s

P (yi = s|x)fe(s, i,x)

Page 27: CS388: Natural Language Processinggdurrett/courses/fa2019/...CS388: Natural Language Processing Greg Durre8 Lecture 5: CRFs Administrivia ‣ Project 1 is out, sample writeups on website

Forward-BackwardAlgorithm‣ Howdowecomputethesemarginals?P (yi = s|x)

P (yi = s|x) =X

y1,...,yi�1,yi+1,...,yn

P (y|x)

‣WhatdidViterbicompute? P (ymax|x) = maxy1,...,yn

P (y|x)

‣ Cancomputemarginalswithdynamicprogrammingaswellusingforward-backward

Page 28: CS388: Natural Language Processinggdurrett/courses/fa2019/...CS388: Natural Language Processing Greg Durre8 Lecture 5: CRFs Administrivia ‣ Project 1 is out, sample writeups on website

Forward-BackwardAlgorithm

P (y3 = 2|x) =sum of all paths through state 2 at time 3

sum of all paths

Page 29: CS388: Natural Language Processinggdurrett/courses/fa2019/...CS388: Natural Language Processing Greg Durre8 Lecture 5: CRFs Administrivia ‣ Project 1 is out, sample writeups on website

Forward-BackwardAlgorithm

slidecredit:DanKlein

P (y3 = 2|x) =sum of all paths through state 2 at time 3

sum of all paths

=

‣ Easiestandmostflexibletodoonepasstocomputeandonetocompute

Page 30: CS388: Natural Language Processinggdurrett/courses/fa2019/...CS388: Natural Language Processing Greg Durre8 Lecture 5: CRFs Administrivia ‣ Project 1 is out, sample writeups on website

Forward-BackwardAlgorithm

‣ IniMal:

‣ Recurrence:

‣ SameasViterbibutsumminginsteadofmaxing!

‣ ThesequanMMesgetverysmall!StoreeverythingaslogprobabiliMes

↵1(s) = exp(�e(s, 1,x))<latexit sha1_base64="H++BwoHSkR4cFwia4mrESCMTX7M=">AAACE3icbVBNS8NAEN3Ur1q/oh69LBahlVKSKuhFEL14VLC20JSw2U7apZtk2d1IS+h/8OJf8eJBEa9evPlv3NYetPXBwOO9GWbmBYIzpR3ny8otLC4tr+RXC2vrG5tb9vbOnUpSSaFOE57IZkAUcBZDXTPNoSkkkCjg0Aj6l2O/cQ9SsSS+1UMB7Yh0YxYySrSRfPvQI1z0iO+WVBmfYQ8GouSJHvOhpCpuxYuI7gVhNhiVy75ddKrOBHieuFNSRFNc+/an10loGkGsKSdKtVxH6HZGpGaUw6jgpQoEoX3ShZahMYlAtbPJTyN8YJQODhNpKtZ4ov6eyEik1DAKTOf4RjXrjcX/vFaqw9N2xmKRaojpz6Iw5VgneBwQ7jAJVPOhIYRKZm7FtEckodrEWDAhuLMvz5O7WtU9qtZujovnF9M48mgP7aMSctEJOkdX6BrVEUUP6Am9oFfr0Xq23qz3n9acNZ3ZRX9gfXwDGCWcbA==</latexit>

↵t(st) =X

st�1

↵t�1(st�1) exp(�e(st, t,x))<latexit sha1_base64="MsrARqnMVgv4lQtn8MktFrMi2DM=">AAAChHicbVFNT9wwEHVCS+n2g2175GKxWjWR0lUMreilCNFLjyB1AWmzshyvw1o4iWVPEKsov4R/1Rv/pk4IogsdydLze2/G45lUK2khju88f+PFy81XW68Hb96+e789/PDxzJaV4WLKS1Wai5RZoWQhpiBBiQttBMtTJc7Tq5+tfn4tjJVl8RtWWsxzdlnITHIGjqLD23HClF4ySgIb4h84ETc6SPRSUhHYiERJzmCZZvVNE4aD3gqBpdCZbZXT2tIavpCmwb3c3YKeDdcrUohgveb4UYeHpKitH2I6HMWTuAv8HJAejFAfJ3T4J1mUvMpFAVwxa2ck1jCvmQHJlWgGSWWFZvyKXYqZgwXLhZ3X3RAbPHbMAmelcacA3LH/ZtQst3aVp87Z9m+fai35P21WQfZ9XstCVyAKfv9QVikMJW43ghfSCA5q5QDjRrpeMV8ywzi4vQ3cEMjTLz8HZ3sTsj/ZO/06Ojrux7GFdtAuChBBB+gI/UInaIq453mfvdgj/qYf+fv+t3ur7/U5n9Ba+Id/AUKdvsc=</latexit> exp(�t(st�1, st))

<latexit sha1_base64="+aaYS+pf0KkwlV5IpyMknO6c7lg=">AAAChHicbVFNT9wwEHUCpXT7tdBjLxYr1ERKVzG0gksr1F44gtQFpM3KcrwOa+Eklj1BrKL8kv4rbvybOiGoLDCSpef33ozHM6lW0kIc33n+2vqrjdebbwZv373/8HG4tX1my8pwMeGlKs1FyqxQshATkKDEhTaC5akS5+nV71Y/vxbGyrL4A0stZjm7LGQmOQNH0eHf3YQpvWCUBDbEP3AibnSQ6IWkIrARiZKcwSLN6psmDAcPXggshc5tq5zWltbwlTQN7uXuFvRsuFqSQgSrRf/L8JATteVDTIejeBx3gZ8D0oMR6uOEDm+TecmrXBTAFbN2SmINs5oZkFyJZpBUVmjGr9ilmDpYsFzYWd0NscG7jpnjrDTuFIA79nFGzXJrl3nqnG379qnWki9p0wqyw1ktC12BKPj9Q1mlMJS43QieSyM4qKUDjBvpesV8wQzj4PY2cEMgT7/8HJztjcn+eO/02+joVz+OTfQZ7aAAEXSAjtAxOkETxD3P++LFHvE3/Mjf97/fW32vz/mEVsL/+Q8xFr7H</latexit>

Page 31: CS388: Natural Language Processinggdurrett/courses/fa2019/...CS388: Natural Language Processing Greg Durre8 Lecture 5: CRFs Administrivia ‣ Project 1 is out, sample writeups on website

Forward-BackwardAlgorithm

‣ IniMal:�n(s) = 1

‣ Recurrence:

‣ Bigdifferences:countemissionforthenextMmestep(notcurrentone)

�t(st) =X

st+1

�t+1(st+1) exp(�e(st+1, t+ 1,x))

<latexit sha1_base64="o+YDBL44WMQt+0KfoSuWzIAvSK8=">AAAC3XicbVLLbtQwFHXCq4RHp7BkYzGqlIgwSgpS2SBVsGFZJKatmIwsx+N0rDqJZd+gjqJIbFiAEFv+ix3/wQfguClipr2SreNz7r3Hr1xJYSBJfnv+jZu3bt/Zuhvcu//g4fZo59GRqRvN+JTVstYnOTVciopPQYDkJ0pzWuaSH+dnb3v9+BPXRtTVB1gpPi/paSUKwShYioz+7GZUqiUlaWgi/Bpn/FyFmVoKwkMTp3FWUljmRXveRVFwmQuhIeCyTVOS1pAWnqddhwfZrcKBjdZbEohhs+k/HS6L4r5/hIMs53Ct3TNn51S3CAdy082RsZ3WPMlonEwSF/gqSAcwRkMcktGvbFGzpuQVMEmNmaWJgnlLNQgmeRdkjeGKsjN6ymcWVrTkZt661+nwrmUWuKi1HRVgx/5f0dLSmFWZ28x+j2ZT68nrtFkDxat5KyrVAK/YhVHRSAw17p8aL4TmDOTKAsq0sHvFbEk1ZWA/RGAvId088lVwtDdJX0z23r8cH7wZrmMLPUFPUYhStI8O0Dt0iKaIeR+9z95X75tP/C/+d//HRarvDTWP0Vr4P/8Cs27g5w==</latexit> exp(�t(st, st+1))<latexit sha1_base64="MMxuphUYCcsoCkHdOiUL0s5PNcs=">AAAC+nicbVLLbtQwFHVSHu3wmpYlG4tRpUSEUVKQYINUwYZlkZi20mRkOR6nY9VJLPsGOgr5FDYsQIgtX9Idf4PjZhAz7ZVsHZ9z7z1+ZUoKA3H8x/O3bt2+c3d7Z3Dv/oOHj4a7e8emqjXjE1bJSp9m1HApSj4BAZKfKs1pkUl+kp2/6/STT1wbUZUfYan4rKBnpcgFo2ApsusN91Mq1YKSJDAhfoNTfqGCVC0E4YGJkigtKCyyvLlow3CwyoXAEHDZpi5IY0gDz5O2xb3sVkHPhustCUSw2fSfDquiqOsfYqtlHG70e+b8nOoWQU9u2jkystO66bonRKvqEJPhKB7HLvB1kPRghPo4IsPLdF6xuuAlMEmNmSaxgllDNQgmeTtIa8MVZef0jE8tLGnBzaxxT9fifcvMcV5pO0rAjv2/oqGFMcsis5nd9s2m1pE3adMa8tezRpSqBl6yK6O8lhgq3P0DPBeaM5BLCyjTwu4VswXVlIH9LQN7Ccnmka+D44Nx8mJ88OHl6PBtfx3b6Al6igKUoFfoEL1HR2iCmPfZ++p99374X/xv/k//11Wq7/U1j9Fa+L//AuUc6iA=</latexit>

Page 32: CS388: Natural Language Processinggdurrett/courses/fa2019/...CS388: Natural Language Processing Greg Durre8 Lecture 5: CRFs Administrivia ‣ Project 1 is out, sample writeups on website

Forward-BackwardAlgorithm

�n(s) = 1

P (s3 = 2|x) = ↵3(2)�3(2)Pi ↵3(i)�3(i)

‣Whatisthedenominatorhere?

‣ Doesthisexplainwhybetaiswhatitis?P (x)

↵1(s) = exp(�e(s, 1,x))<latexit sha1_base64="H++BwoHSkR4cFwia4mrESCMTX7M=">AAACE3icbVBNS8NAEN3Ur1q/oh69LBahlVKSKuhFEL14VLC20JSw2U7apZtk2d1IS+h/8OJf8eJBEa9evPlv3NYetPXBwOO9GWbmBYIzpR3ny8otLC4tr+RXC2vrG5tb9vbOnUpSSaFOE57IZkAUcBZDXTPNoSkkkCjg0Aj6l2O/cQ9SsSS+1UMB7Yh0YxYySrSRfPvQI1z0iO+WVBmfYQ8GouSJHvOhpCpuxYuI7gVhNhiVy75ddKrOBHieuFNSRFNc+/an10loGkGsKSdKtVxH6HZGpGaUw6jgpQoEoX3ShZahMYlAtbPJTyN8YJQODhNpKtZ4ov6eyEik1DAKTOf4RjXrjcX/vFaqw9N2xmKRaojpz6Iw5VgneBwQ7jAJVPOhIYRKZm7FtEckodrEWDAhuLMvz5O7WtU9qtZujovnF9M48mgP7aMSctEJOkdX6BrVEUUP6Am9oFfr0Xq23qz3n9acNZ3ZRX9gfXwDGCWcbA==</latexit>

↵t(st) =X

st�1

↵t�1(st�1) exp(�e(st, t,x))<latexit sha1_base64="MsrARqnMVgv4lQtn8MktFrMi2DM=">AAAChHicbVFNT9wwEHVCS+n2g2175GKxWjWR0lUMreilCNFLjyB1AWmzshyvw1o4iWVPEKsov4R/1Rv/pk4IogsdydLze2/G45lUK2khju88f+PFy81XW68Hb96+e789/PDxzJaV4WLKS1Wai5RZoWQhpiBBiQttBMtTJc7Tq5+tfn4tjJVl8RtWWsxzdlnITHIGjqLD23HClF4ySgIb4h84ETc6SPRSUhHYiERJzmCZZvVNE4aD3gqBpdCZbZXT2tIavpCmwb3c3YKeDdcrUohgveb4UYeHpKitH2I6HMWTuAv8HJAejFAfJ3T4J1mUvMpFAVwxa2ck1jCvmQHJlWgGSWWFZvyKXYqZgwXLhZ3X3RAbPHbMAmelcacA3LH/ZtQst3aVp87Z9m+fai35P21WQfZ9XstCVyAKfv9QVikMJW43ghfSCA5q5QDjRrpeMV8ywzi4vQ3cEMjTLz8HZ3sTsj/ZO/06Ojrux7GFdtAuChBBB+gI/UInaIq453mfvdgj/qYf+fv+t3ur7/U5n9Ba+Id/AUKdvsc=</latexit> exp(�t(st�1, st))

<latexit sha1_base64="+aaYS+pf0KkwlV5IpyMknO6c7lg=">AAAChHicbVFNT9wwEHUCpXT7tdBjLxYr1ERKVzG0gksr1F44gtQFpM3KcrwOa+Eklj1BrKL8kv4rbvybOiGoLDCSpef33ozHM6lW0kIc33n+2vqrjdebbwZv373/8HG4tX1my8pwMeGlKs1FyqxQshATkKDEhTaC5akS5+nV71Y/vxbGyrL4A0stZjm7LGQmOQNH0eHf3YQpvWCUBDbEP3AibnSQ6IWkIrARiZKcwSLN6psmDAcPXggshc5tq5zWltbwlTQN7uXuFvRsuFqSQgSrRf/L8JATteVDTIejeBx3gZ8D0oMR6uOEDm+TecmrXBTAFbN2SmINs5oZkFyJZpBUVmjGr9ilmDpYsFzYWd0NscG7jpnjrDTuFIA79nFGzXJrl3nqnG379qnWki9p0wqyw1ktC12BKPj9Q1mlMJS43QieSyM4qKUDjBvpesV8wQzj4PY2cEMgT7/8HJztjcn+eO/02+joVz+OTfQZ7aAAEXSAjtAxOkETxD3P++LFHvE3/Mjf97/fW32vz/mEVsL/+Q8xFr7H</latexit>

�t(st) =X

st+1

�t+1(st+1) exp(�e(st+1, t+ 1,x))

<latexit sha1_base64="o+YDBL44WMQt+0KfoSuWzIAvSK8=">AAAC3XicbVLLbtQwFHXCq4RHp7BkYzGqlIgwSgpS2SBVsGFZJKatmIwsx+N0rDqJZd+gjqJIbFiAEFv+ix3/wQfguClipr2SreNz7r3Hr1xJYSBJfnv+jZu3bt/Zuhvcu//g4fZo59GRqRvN+JTVstYnOTVciopPQYDkJ0pzWuaSH+dnb3v9+BPXRtTVB1gpPi/paSUKwShYioz+7GZUqiUlaWgi/Bpn/FyFmVoKwkMTp3FWUljmRXveRVFwmQuhIeCyTVOS1pAWnqddhwfZrcKBjdZbEohhs+k/HS6L4r5/hIMs53Ct3TNn51S3CAdy082RsZ3WPMlonEwSF/gqSAcwRkMcktGvbFGzpuQVMEmNmaWJgnlLNQgmeRdkjeGKsjN6ymcWVrTkZt661+nwrmUWuKi1HRVgx/5f0dLSmFWZ28x+j2ZT68nrtFkDxat5KyrVAK/YhVHRSAw17p8aL4TmDOTKAsq0sHvFbEk1ZWA/RGAvId088lVwtDdJX0z23r8cH7wZrmMLPUFPUYhStI8O0Dt0iKaIeR+9z95X75tP/C/+d//HRarvDTWP0Vr4P/8Cs27g5w==</latexit> exp(�t(st, st+1))<latexit sha1_base64="MMxuphUYCcsoCkHdOiUL0s5PNcs=">AAAC+nicbVLLbtQwFHVSHu3wmpYlG4tRpUSEUVKQYINUwYZlkZi20mRkOR6nY9VJLPsGOgr5FDYsQIgtX9Idf4PjZhAz7ZVsHZ9z7z1+ZUoKA3H8x/O3bt2+c3d7Z3Dv/oOHj4a7e8emqjXjE1bJSp9m1HApSj4BAZKfKs1pkUl+kp2/6/STT1wbUZUfYan4rKBnpcgFo2ApsusN91Mq1YKSJDAhfoNTfqGCVC0E4YGJkigtKCyyvLlow3CwyoXAEHDZpi5IY0gDz5O2xb3sVkHPhustCUSw2fSfDquiqOsfYqtlHG70e+b8nOoWQU9u2jkystO66bonRKvqEJPhKB7HLvB1kPRghPo4IsPLdF6xuuAlMEmNmSaxgllDNQgmeTtIa8MVZef0jE8tLGnBzaxxT9fifcvMcV5pO0rAjv2/oqGFMcsis5nd9s2m1pE3adMa8tezRpSqBl6yK6O8lhgq3P0DPBeaM5BLCyjTwu4VswXVlIH9LQN7Ccnmka+D44Nx8mJ88OHl6PBtfx3b6Al6igKUoFfoEL1HR2iCmPfZ++p99374X/xv/k//11Wq7/U1j9Fa+L//AuUc6iA=</latexit>

Page 33: CS388: Natural Language Processinggdurrett/courses/fa2019/...CS388: Natural Language Processing Greg Durre8 Lecture 5: CRFs Administrivia ‣ Project 1 is out, sample writeups on website

CompuMngMarginals

y1 y2 yn…

�e

�t

P (y|x) = 1

Z

nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(yi, i,x))

Z =X

y

nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(yi, i,x))

‣ ForbothHMMsandCRFs:

‣ Normalizingconstant

P (yi = s|x) = forwardi(s)backwardi(s)Ps0 forwardi(s

0)backwardi(s0)

ZforCRFs,P(x)forHMMs

‣ AnalogoustoP(x)forHMMs

Page 34: CS388: Natural Language Processinggdurrett/courses/fa2019/...CS388: Natural Language Processing Greg Durre8 Lecture 5: CRFs Administrivia ‣ Project 1 is out, sample writeups on website

Posteriorsvs.ProbabiliMes

P (yi = s|x) = forwardi(s)backwardi(s)Ps0 forwardi(s

0)backwardi(s0)

‣ Posteriorisderivedfromtheparametersandthedata(condiMonedonx!)

HMM

CRF

Modelparameter(usuallymulMnomialdistribuMon)

InferredquanMtyfromforward-backward

InferredquanMtyfromforward-backward

Undefined(modelisbydefiniMoncondiMonedonx)

P (xi|yi), P (yi|yi�1) P (yi|x), P (yi�1, yi|x)

Page 35: CS388: Natural Language Processinggdurrett/courses/fa2019/...CS388: Natural Language Processing Greg Durre8 Lecture 5: CRFs Administrivia ‣ Project 1 is out, sample writeups on website

TrainingCRFs

‣ TransiMonfeatures:needtocompute

‣ …butyoucanbuildapre8ygoodsystemwithoutlearnedtransiMonfeatures(useheurisMcweights,orjustenforceconstraintslikeB-PER->I-ORGisillegal)

P (yi = s1, yi+1 = s2|x)usingforward-backwardaswell

‣ Foremissionfeatures:

goldfeatures—expectedfeaturesundermodel

@

@wL(y⇤,x) =

nX

i=1

fe(y⇤i , i,x)�

nX

i=1

X

s

P (yi = s|x)fe(s, i,x)

Page 36: CS388: Natural Language Processinggdurrett/courses/fa2019/...CS388: Natural Language Processing Greg Durre8 Lecture 5: CRFs Administrivia ‣ Project 1 is out, sample writeups on website

CRFsOutline

‣Model: P (y|x) = 1

Z

nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(yi, i,x))

‣ Inference:argmaxP(y|x)fromViterbi

‣ Learning:runforward-backwardtocomputeposteriorprobabiliMes;then

P (y|x) / expw>

"nX

i=2

ft(yi�1, yi) +nX

i=1

fe(yi, i,x)

#

@

@wL(y⇤,x) =

nX

i=1

fe(y⇤i , i,x)�

nX

i=1

X

s

P (yi = s|x)fe(s, i,x)

Page 37: CS388: Natural Language Processinggdurrett/courses/fa2019/...CS388: Natural Language Processing Greg Durre8 Lecture 5: CRFs Administrivia ‣ Project 1 is out, sample writeups on website

Pseudocode

foreachepoch

foreachexample

extractfeaturesoneachemissionandtransiMon(lookupincache)

computemarginalprobabiliMeswithforward-backward

computepotenMalsphibasedonfeatures+weights

accumulategradientoverallemissionsandtransiMons

Page 38: CS388: Natural Language Processinggdurrett/courses/fa2019/...CS388: Natural Language Processing Greg Durre8 Lecture 5: CRFs Administrivia ‣ Project 1 is out, sample writeups on website

ImplementaMonTipsforCRFs‣ Cachingisyourfriend!Cachefeaturevectorsespecially

‣ TrytoreduceredundantcomputaMon,e.g.ifyoucomputeboththegradientandtheobjecMvevalue,don’trerunthedynamicprogram

‣ Ifthingsaretooslow,runaprofilerandseewhereMmeisbeingspent.Forward-backwardshouldtakemostoftheMme

‣ Exploitsparsityinfeaturevectorswherepossible,especiallyinfeaturevectorsandgradients

‣ DoalldynamicprogramcomputaMoninlogspacetoavoidunderflow

Page 39: CS388: Natural Language Processinggdurrett/courses/fa2019/...CS388: Natural Language Processing Greg Durre8 Lecture 5: CRFs Administrivia ‣ Project 1 is out, sample writeups on website

DebuggingTipsforCRFs‣ Hardtoknowwhetherinference,learning,orthemodelisbroken!

‣ ComputetheobjecMve—isopMmizaMonworking?

‣ Learning:istheobjecMvegoingdown?Trytofit1example/10examples.Areyouapplyingthegradientcorrectly?

‣ Inference:checkgradientcomputaMon(mostlikelyplaceforbugs)

‣ Isthesameforalli?‣ DoprobabiliMesnormalizecorrectly+look“reasonable”?(Nearlyuniformwhenuntrained,thenslowlyconvergingtotherightthing)

‣ IfobjecMveisgoingdownbutmodelperformanceisbad:

‣ Inference:checkperformanceifyoudecodethetrainingset

X

s

forwardi(s)backwardi(s)