Natural Language Processing with Deep Learning
CS224N/Ling284
Christopher Manning and Richard Socher
Lecture 14: Tree Recursive Neural Networks and Constituency Parsing
LecturePlan
1. Mo%va%on:Composi%onalityandRecursion2. Structurepredic%onwithsimpleTreeRNN:Parsing3. Researchhighlight:DeepReinforcementLearningforDialogue
Genera9on4. Backpropaga%onthroughStructure5. Morecomplexunits
Reminders/comments:LearnuponGPUs,Azure,DockerAss4:Getsomethingworking,usingaGPUformilestoneFinalprojectdiscussions�comemeetwithus!
1.ThespectrumoflanguageinCS
3
Seman9cinterpreta9onoflanguage–Notjustwordvectors
Howcanweknowwhenlargerunitsaresimilarinmeaning?
• Thesnowboarderisleapingoveramogul
• Apersononasnowboardjumpsintotheair
Peopleinterpretthemeaningoflargertextunits–en%%es,descrip%veterms,facts,arguments,stories–byseman9ccomposi9onofsmallerelements
Compositionality
Language understanding – & Artificial Intelligence – requires being able to understand bigger
things from knowing about smaller parts
8
Arelanguagesrecursive?
• Cogni%velysomewhatdebatable• But:recursionisnaturalfordescribinglanguage• [Themanfrom[thecompanythatyouspokewithabout[the
project]yesterday]]• nounphrasecontaininganounphrasecontaininganounphrase• Argumentsfornow:1)Helpfulindisambigua%on:
Isrecursionuseful?
2)Helpfulforsometaskstorefertospecificphrases:• JohnandJanewenttoabigfes%val.Theyenjoyedthetripandthemusicthere.
• “they”:JohnandJane• “thetrip”:wenttoabigfes%val• “there”:bigfes%val
3)Worksbe^erforsometaskstousegramma%caltreestructure• It’sapowerfulpriorforlanguagestructure
BuildingonWordVectorSpaceModels
11
x2
x1012345678910
5
4
3
2
1Monday
92
Tuesday 9.51.5
Bymappingthemintothesamevectorspace!
15
1.14
thecountryofmybirththeplacewhereIwasborn
Howcanwerepresentthemeaningoflongerphrases?
France 22.5
Germany 13
Howshouldwemapphrasesintoavectorspace?
thecountryofmybirth
0.40.3
2.33.6
44.5
77
2.13.3
2.53.8
5.56.1
13.5
15
Useprincipleofcomposi%onalityThemeaning(vector)ofasentenceisdeterminedby(1) themeaningsofitswordsand(2) therulesthatcombinethem.
Modelsinthissec%oncanjointlylearnparsetreesandcomposi%onalvectorrepresenta%ons
x2
x1012345678910
5
4
3
2
1
thecountryofmybirth
theplacewhereIwasborn
Monday
Tuesday
FranceGermany
12
Cons9tuencySentenceParsing:Whatwewant
91
53
85
91
43
NPNP
PP
S
71
VP
Thecatsatonthemat.13
LearnStructureandRepresenta9on
NPNP
PP
S
VP
52 3
3
83
54
73
Thecatsatonthemat.
91
53
85
91
43
71
14
Recursivevs.recurrentneuralnetworks
3/2/17
thecountryofmybirth
0.40.3
2.33.6
44.5
77
2.13.3
2.53.8
5.56.1
13.5
15
thecountryofmybirth
0.40.3
2.33.6
44.5
77
2.13.3
4.53.8
5.56.1
13.5
15
2.53.8
Recursivevs.recurrentneuralnetworks
3/2/17RichardSocher
• Recursiveneuralnetsrequireaparsertogettreestructure
• Recurrentneuralnetscannotcapturephraseswithoutprefixcontextandohencapturetoomuchoflastwordsinfinalvector
thecountryofmybirth
0.40.3
2.33.6
44.5
77
2.13.3
2.53.8
5.56.1
13.5
15
thecountryofmybirth
0.40.3
2.33.6
44.5
77
2.13.3
4.53.8
5.56.1
13.5
15
2.53.8
FromRNNstoCNNs
3/2/17RichardSocher
• RNN:Getcomposi%onalvectorsforgramma%calphrasesonly
• CNN:Computesvectorsforeverypossiblephrase• Example:“thecountryofmybirth”computesvectorsfor:
• thecountry,countryof,ofmy,mybirth,thecountryof,countryofmy,ofmybirth,thecountryofmy,countryofmybirth
• Regardlessofwhethereachisgramma%cal–manydon’tmakesense
• Don’tneedparser• Butmaybenotverylinguis%callyorcogni%velyplausible
Rela9onshipbetweenRNNsandCNNs
• CNN RNN
3/2/17RichardSocher
Rela9onshipbetweenRNNsandCNNs
• CNN RNN
peopletherespeakslowlypeopletherespeakslowly
3/2/17RichardSocher
2.RecursiveNeuralNetworksforStructurePredic9on
onthemat.
91
43
33
83
85
33
Neural Network
83
1.3
Inputs:twocandidatechildren’srepresenta%onsOutputs:1. Theseman%crepresenta%onifthetwonodesaremerged.2. Scoreofhowplausiblethenewnodewouldbe.
85
20
RecursiveNeuralNetworkDefini9on
score=UTp
p=tanh(W+b),
SameWparametersatallnodesofthetree
85
33
Neural Network
83
1.3score= =parent
c1c2
c1c2
21
ParsingasentencewithanRNN
Neural Network
0.120
Neural Network
0.410
Neural Network
2.333
91
53
85
91
43
71
Neural Network
3.152
Neural Network
0.301
Thecatsatonthemat.
22
Parsingasentence
91
53
52
Neural Network
1.121
Neural Network
0.120
Neural Network
0.410
Neural Network
2.333
53
85
91
43
71
23
Thecatsatonthemat.
Parsingasentence
52
Neural Network
1.121
Neural Network
0.120
33
Neural Network
3.683
91
5353
85
91
43
71
24
Thecatsatonthemat.
Parsingasentence
52
33
83
54
73
91
5353
85
91
43
71
25Thecatsatonthemat.
Max-MarginFramework-Details
• Thescoreofatreeiscomputedbythesumoftheparsingdecisionscoresateachnode:
• xissentence;yisparsetree
85
33
RNN
831.3
26
Max-MarginFramework-Details
• Similartomax-marginparsing(Taskaretal.2004),asupervisedmax-marginobjec%ve
• Thelosspenalizesallincorrectdecisions
• StructuresearchforA(x)wasgreedy(joinbestnodeseach%me)• Instead:Beamsearchwithchart
27
Backpropaga9onThroughStructure
IntroducedbyGoller&Küchler(1996)Principallythesameasgeneralbackpropaga%onThreedifferencesresul%ngfromtherecursionandtreestructure:
1. Sumderiva%vesofWfromallnodes(likeRNN)2. Splitderiva%vesateachnode(fortree)3. Adderrormessagesfromparent+nodeitself
28
The second derivative in eq. 28 for output units is simply
@a
(nl
)i
@W
(nl
�1)ij
=@
@W
(nl
�1)ij
z
(nl
)i
=@
@W
(nl
�1)ij
⇣W
(nl
�1)i· a
(nl
�1)⌘= a
(nl
�1)j
. (46)
We adopt standard notation and introduce the error � related to an output unit:
@E
n
@W
(nl
�1)ij
= (yi
� t
i
)a(nl
�1)j
= �
(nl
)i
a
(nl
�1)j
. (47)
So far, we only computed errors for output units, now we will derive �’s for normal hidden units andshow how these errors are backpropagated to compute weight derivatives of lower levels. We will start withsecond to top layer weights from which a generalization to arbitrarily deep layers will become obvious.Similar to eq. 28, we start with the error derivative:
@E
@W
(nl
�2)ij
=X
n
@E
n
@a
(nl
)| {z }�
(nl
)
@a
(nl
)
@W
(nl
�2)ij
+ �W
(nl
�2)ji
. (48)
Now,
(�(nl
))T@a
(nl
)
@W
(nl
�2)ij
= (�(nl
))T@z
(nl
)
@W
(nl
�2)ij
(49)
= (�(nl
))T@
@W
(nl
�2)ij
W
(nl
�1)a
(nl
�1) (50)
= (�(nl
))T@
@W
(nl
�2)ij
W
(nl
�1)·i a
(nl
�1)i
(51)
= (�(nl
))TW (nl
�1)·i
@
@W
(nl
�2)ij
a
(nl
�1)i
(52)
= (�(nl
))TW (nl
�1)·i
@
@W
(nl
�2)ij
f(z(nl
�1)i
) (53)
= (�(nl
))TW (nl
�1)·i
@
@W
(nl
�2)ij
f(W (nl
�2)i· a
(nl
�2)) (54)
= (�(nl
))TW (nl
�1)·i f
0(z(nl
�1)i
)a(nl
�2)j
(55)
=⇣(�(nl
))TW (nl
�1)·i
⌘f
0(z(nl
�1)i
)a(nl
�2)j
(56)
=
0
@s
l+1X
j=1
W
(nl
�1)ji
�
(nl
)j
)
1
Af
0(z(nl
�1)i
)
| {z }
a
(nl
�2)j
(57)
= �
(nl
�1)i
a
(nl
�2)j
(58)
where we used in the first line that the top layer is linear. This is a very detailed account of essentiallyjust the chain rule.
So, we can write the � errors of all layers l (except the top layer) (in vector format, using the Hadamardproduct �):
�
(l) =⇣(W (l))T �(l+1)
⌘� f 0(z(l)), (59)
7
where the sigmoid derivative from eq. 14 gives f 0(z(l)) = (1� a
(l))a(l). Using that definition, we get thehidden layer backprop derivatives:
@
@W
(l)ij
E
R
= a
(l)j
�
(l+1)i
+ �W
(l)ij
(60)
(61)
Which in one simplified vector notation becomes:
@
@W
(l)E
R
= �
(l+1)(a(l))T + �W
(l). (62)
In summary, the backprop procedure consists of four steps:
1. Apply an input x
n
and forward propagate it through the network to get the hidden and outputactivations using eq. 18.
2. Evaluate �
(nl
) for output units using eq. 42.
3. Backpropagate the �’s to obtain a �
(l) for each hidden layer in the network using eq. 59.
4. Evaluate the required derivatives with eq. 62 and update all the weights using an optimizationprocedure such as conjugate gradient or L-BFGS. CG seems to be faster and work better whenusing mini-batches of training data to estimate the derivatives.
If you have any further questions or found errors, please send an email to [email protected]
5 Recursive Neural Networks
Same as backprop in previous section but splitting error derivatives and noting that the derivatives of thesame W at each node can all be added up. Lastly, the delta’s from the parent node and possible delta’sfrom a softmax classifier at each node are just added.
References
[Ben07] Yoshua Bengio. Learning deep architectures for ai. Technical report, Dept. IRO, Universite deMontreal, 2007.
8
BTS:1)Sumderiva9vesofallnodes
Youcanactuallyassumeit’sadifferentWateachnodeIntui%onviaexample:Ifwetakeseparatederiva%vesofeachoccurrence,wegetsame:
29
BTS:2)Splitderiva9vesateachnode
Duringforwardprop,theparentiscomputedusing2children
Hence,theerrorsneedtobecomputedwrteachofthem:
whereeachchild’serrorisn-dimensional
85
33
83
c1p=tanh(W+b)c1
c2c2
85
33
83
c1 c2
30
BTS:3)Adderrormessages
• Ateachnode:• Whatcameup(fprop)mustcomedown(bprop)• Totalerrormessages=errormessagesfromparent+errormessagefromownscore
3/2/17RichardSocherLecture1,Slide31
85
33
83
c1 c2
parentscore
BTSPythonCode:forwardProp
3/2/17RichardSocherLecture1,Slide32
BTSPythonCode:backProp
3/2/17RichardSocherLecture1,Slide33
The second derivative in eq. 28 for output units is simply
@a
(nl
)i
@W
(nl
�1)ij
=@
@W
(nl
�1)ij
z
(nl
)i
=@
@W
(nl
�1)ij
⇣W
(nl
�1)i· a
(nl
�1)⌘= a
(nl
�1)j
. (46)
We adopt standard notation and introduce the error � related to an output unit:
@E
n
@W
(nl
�1)ij
= (yi
� t
i
)a(nl
�1)j
= �
(nl
)i
a
(nl
�1)j
. (47)
So far, we only computed errors for output units, now we will derive �’s for normal hidden units andshow how these errors are backpropagated to compute weight derivatives of lower levels. We will start withsecond to top layer weights from which a generalization to arbitrarily deep layers will become obvious.Similar to eq. 28, we start with the error derivative:
@E
@W
(nl
�2)ij
=X
n
@E
n
@a
(nl
)| {z }�
(nl
)
@a
(nl
)
@W
(nl
�2)ij
+ �W
(nl
�2)ji
. (48)
Now,
(�(nl
))T@a
(nl
)
@W
(nl
�2)ij
= (�(nl
))T@z
(nl
)
@W
(nl
�2)ij
(49)
= (�(nl
))T@
@W
(nl
�2)ij
W
(nl
�1)a
(nl
�1) (50)
= (�(nl
))T@
@W
(nl
�2)ij
W
(nl
�1)·i a
(nl
�1)i
(51)
= (�(nl
))TW (nl
�1)·i
@
@W
(nl
�2)ij
a
(nl
�1)i
(52)
= (�(nl
))TW (nl
�1)·i
@
@W
(nl
�2)ij
f(z(nl
�1)i
) (53)
= (�(nl
))TW (nl
�1)·i
@
@W
(nl
�2)ij
f(W (nl
�2)i· a
(nl
�2)) (54)
= (�(nl
))TW (nl
�1)·i f
0(z(nl
�1)i
)a(nl
�2)j
(55)
=⇣(�(nl
))TW (nl
�1)·i
⌘f
0(z(nl
�1)i
)a(nl
�2)j
(56)
=
0
@s
l+1X
j=1
W
(nl
�1)ji
�
(nl
)j
)
1
Af
0(z(nl
�1)i
)
| {z }
a
(nl
�2)j
(57)
= �
(nl
�1)i
a
(nl
�2)j
(58)
where we used in the first line that the top layer is linear. This is a very detailed account of essentiallyjust the chain rule.
So, we can write the � errors of all layers l (except the top layer) (in vector format, using the Hadamardproduct �):
�
(l) =⇣(W (l))T �(l+1)
⌘� f 0(z(l)), (59)
7
where the sigmoid derivative from eq. 14 gives f 0(z(l)) = (1� a
(l))a(l). Using that definition, we get thehidden layer backprop derivatives:
@
@W
(l)ij
E
R
= a
(l)j
�
(l+1)i
+ �W
(l)ij
(60)
(61)
Which in one simplified vector notation becomes:
@
@W
(l)E
R
= �
(l+1)(a(l))T + �W
(l). (62)
In summary, the backprop procedure consists of four steps:
1. Apply an input x
n
and forward propagate it through the network to get the hidden and outputactivations using eq. 18.
2. Evaluate �
(nl
) for output units using eq. 42.
3. Backpropagate the �’s to obtain a �
(l) for each hidden layer in the network using eq. 59.
4. Evaluate the required derivatives with eq. 62 and update all the weights using an optimizationprocedure such as conjugate gradient or L-BFGS. CG seems to be faster and work better whenusing mini-batches of training data to estimate the derivatives.
If you have any further questions or found errors, please send an email to [email protected]
5 Recursive Neural Networks
Same as backprop in previous section but splitting error derivatives and noting that the derivatives of thesame W at each node can all be added up. Lastly, the delta’s from the parent node and possible delta’sfrom a softmax classifier at each node are just added.
References
[Ben07] Yoshua Bengio. Learning deep architectures for ai. Technical report, Dept. IRO, Universite deMontreal, 2007.
8
BTS:Op9miza9on
• Asbefore,wecanplugthegradientsintoastandardoff-the-shelfL-BFGSop%mizerorSGD
• BestresultswithAdaGrad(Duchietal,2011):
• Fornon-con%nuousobjec%veusesubgradientmethod(Ratliffetal.2007)
34
Deep Reinforcement Learning for Dialogue Generation Jiwei Li, Will Monroe, Alan Ritter, Michel Galley, Jianfeng Gao and Dan Jurafsky
Seq2Seq for Dialogue
Encode previous message(s) into vector
How are you
I
_
am
I
fine
am
Decode vector into response
Seq2Seq for Dialogue
Encode previous message(s) into vector
How are you
I
_
am
I
fine
am
Decode vector into response
Train by maximizing
p(response|input)
where the response is produced by a human
Problems with Seq2Seq
How old are you?
Problems with Seq2Seq
How old are you?
16?
I’m 16
Problems with Seq2Seq
How old are you?
16?
I’m 16
I don’t know what you’re talking about
You don’t know what you’re saying
I don’t know what you’re talking about
You don’t know what you’re saying
Problems with Seq2Seq
How old are you?
16?
I’m 16
I don’t know what you’re talking about
You don’t know what you’re saying
I don’t know what you’re talking about
You don’t know what you’re saying
reasonable, but unhelpful
generic
probable response != good response
What is a good response?
• Reasonable p(response|input) is high according to seq2seq model
• Nonrepetitive similarity between response and previous messages is low
• Easy to answer p(“i don’t know”|response) is low
What is a good response?
• Reasonable p(response|input) is high according to seq2seq model
• Nonrepetitive similarity between response and previous messages is low
• Easy to answer p(“i don’t know”|response) is low
Scoring function: R(response) = reasonable_score + nonrepetitive_score + easy_to_answer_score
Reinforcement Learning
Learn from rewards instead of from examples
1. Encode input into a vector
How are you
Reinforcement Learning
Learn from rewards instead of from examples
2. Have the system generate a response
How are you
I
_
don’t
I
know
don’t
Reinforcement Learning
3. Receive reward R(response)
- Train system to maximize reward
Learn from rewards instead of from examples
R = -5 How are you
I
_
don’t
I
know
don’t
Quantitative Results
Qualitative Results
How old are you?
I thought you were 12
I’m 16. Why are you asking?
What made you think so?
Qualitative Results
How old are you?
I thought you were 12
I’m 16. Why are you asking?
What made you think so?
You don’t know what you’re saying
I don’t know what you’re talking about
You don’t know what you’re saying
Conclusion
• Reinforcement learning useful when we want our model to do more than produce a probable human label
• Many more application of RL to NLP!
Information extraction, question answering, task-oriented dialogue, coreference resolution, and more
Discussion:SimpleRNN• DecentresultswithsinglematrixTreeRNN
• SingleweightmatrixTreeRNNcouldcapturesomephenomenabutnotadequateformorecomplex,higherordercomposi%onandparsinglongsentences
• Thereisnorealinterac%onbetweentheinputwords
• Thecomposi%onfunc%onisthesameforallsyntac%ccategories,punctua%on,etc. W
c1 c2
pWscore s
Version2:Syntac9cally-Un9edRNN
• AsymbolicContext-FreeGrammar(CFG)backboneisadequateforbasicsyntac%cstructure
• Weusethediscretesyntac%ccategoriesofthechildrentochoosethecomposi%onmatrix
• ATreeRNNcandobe^erwithdifferentcomposi%onmatrixfordifferentsyntac%cenvironments
• Theresultgivesusabe^erseman%cs
Composi9onalVectorGrammars
• Problem:Speed.Everycandidatescoreinbeamsearchneedsamatrix-vectorproduct.
• Solu%on:Computescoreonlyforasubsetoftreescomingfromasimpler,fastermodel(PCFG)• Prunesveryunlikelycandidatesforspeed• Providescoarsesyntac%ccategoriesofthechildrenforeachbeamcandidate
• Composi%onalVectorGrammar=PCFG+TreeRNN
Details:Composi9onalVectorGrammar
• Scoresateachnodecomputedbycombina%onofPCFGandSU-RNN:
• Interpreta%on:Factoringdiscreteandcon%nuousparsinginonemodel:
• Socheretal.(2013)
Relatedworkforrecursiveneuralnetworks
Pollack(1990):Recursiveauto-associa%vememoriesPreviousRecursiveNeuralNetworksworkbyGoller&Küchler(1996),Costaetal.(2003)assumedfixedtreestructureandusedonehotvectors.Hinton(1990)andBo^ou(2011):Relatedideasaboutrecursivemodelsandrecursiveoperatorsassmoothversionsoflogicopera%ons
56
RelatedWorkforparsing
• Resul%ngCVGParserisrelatedtopreviousworkthatextendsPCFGparsers
• KleinandManning(2003a):manualfeatureengineering• Petrovetal.(2006):learningalgorithmthatsplitsandmerges
syntac%ccategories• Lexicalizedparsers(Collins,2003;Charniak,2000):describeeach
categorywithalexicalitem• HallandKlein(2012)combineseveralsuchannota%onschemesina
factoredparser.• CVGsextendtheseideasfromdiscreterepresenta%onstoricher
con%nuousones
Experiments• StandardWSJsplit,labeledF1• BasedonsimplePCFGwithfewerstates• Fastpruningofsearchspace,fewmatrix-vectorproducts• 3.8%higherF1,20%fasterthanStanfordfactoredparser
Parser Test,AllSentences
StanfordPCFG,(KleinandManning,2003a) 85.5
StanfordFactored(KleinandManning,2003b) 86.6
FactoredPCFGs(HallandKlein,2012) 89.4
Collins(Collins,1997) 87.7
SSN(Henderson,2004) 89.4
BerkeleyParser(PetrovandKlein,2007) 90.1
CVG(RNN)(Socheretal.,ACL2013) 85.0
CVG(SU-RNN)(Socheretal.,ACL2013) 90.4
Charniak-SelfTrained(McCloskyetal.2006) 91.0
Charniak-SelfTrained-ReRanked(McCloskyetal.2006) 92.1
SU-RNN/CVG[Socher,Bauer,Manning,Ng2013]
Learnssohno%onofheadwordsIni%aliza%on:
NP-CC
NP-PP PP-NP
PRP$-NP
SU-RNN/CVG[Socher,Bauer,Manning,Ng2013]
ADJP-NP
ADVP-ADJP
JJ-NP
DT-NP
Analysisofresul9ngvectorrepresenta9ons
Allthefiguresareadjustedforseasonalvaria%ons1.Allthenumbersareadjustedforseasonalfluctua%ons2.Allthefiguresareadjustedtoremoveusualseasonalpa^erns
Knight-Ridderwouldn’tcommentontheoffer1.Harscodeclinedtosaywhatcountryplacedtheorder2.Coastalwouldn’tdisclosetheterms
Salesgrewalmost7%to$UNKm.from$UNKm.1.Salesrosemorethan7%to$94.9m.from$88.3m.2.Salessurged40%toUNKb.yenfromUNKb.
SU-RNNAnalysis
• Cantransferseman%cinforma%onfromsinglerelatedexample
• Trainsentences:• Heeatsspaghewwithafork.• Sheeatsspaghewwithpork.
• Testsentences• Heeatsspaghewwithaspoon.• Heeatsspaghewwithmeat.
SU-RNNAnalysis
LabelinginRecursiveNeuralNetworks
Neural Network
83
• Wecanuseeachnode’srepresenta%onasfeaturesforasoJmaxclassifier:
• Trainingsimilartomodelinpart1withstandardcross-entropyerror+scores
SoftmaxLayer
NP
64
Version3:Composi9onalityThroughRecursiveMatrix-VectorSpaces
Onewaytomakethecomposi%onfunc%onmorepowerfulwasbyuntyingtheweightsWButwhatifwordsactmostlyasanoperator,e.g.“very”in
verygoodProposal:Anewcomposi%onfunc%on
p=tanh(W+b)
c1c2
Before:
Version3:Matrix-vectorRNNs[Socher,Huval,Bhat,Manning,&Ng,2012]
p
Composi9onalityThroughRecursiveMatrix-VectorRecursiveNeuralNetworks
p=tanh(W+b)
c1c2 p=tanh(W+b)
C2c1C1c2
67
Matrix-vectorRNNs[Socher,Huval,Bhat,Manning,&Ng,2012]
p=
AB
=P
Predic9ngSen9mentDistribu9onsGoodexamplefornon-linearityinlanguage
69
Classifica9onofSeman9cRela9onships
• CananMV-RNNlearnhowalargesyntac%ccontextconveysaseman%crela%onship?
• My[apartment]e1hasapre^ylarge[kitchen]e2 àcomponent-wholerela%onship(e2,e1)
• Buildasinglecomposi%onalseman%csfortheminimalcons%tuentincludingbothterms
Classifica9onofSeman9cRela9onships
Classifier Features F1SVM POS,stemming,syntac%cpa^erns 60.1MaxEnt POS,WordNet,morphologicalfeatures,noun
compoundsystem,thesauri,Googlen-grams77.6
SVM POS,WordNet,prefixes,morphologicalfeatures,dependencyparsefeatures,Levinclasses,PropBank,FrameNet,NomLex-Plus,Googlen-grams,paraphrases,TextRunner
82.2
RNN – 74.8MV-RNN – 79.1MV-RNN POS,WordNet,NER 82.4
SceneParsing
• Themeaningofasceneimageisalsoafunc%onofsmallerregions,
• howtheycombineaspartstoformlargerobjects,
• andhowtheobjectsinteract.
Similarprincipleofcomposi%onality.
72
AlgorithmforParsingImages
SameRecursiveNeuralNetworkasfornaturallanguageparsing!(Socheretal.ICML2011)
Features
Grass Tree
Segments
SemanticRepresentations
People Building
ParsingNaturalSceneImagesParsingNaturalSceneImages
73
Mul9-classsegmenta9on
Method Accuracy
PixelCRF(Gouldetal.,ICCV2009) 74.3
Classifieronsuperpixelfeatures 75.9
Region-basedenergy(Gouldetal.,ICCV2009) 76.4
Locallabelling(Tighe&Lazebnik,ECCV2010) 76.9
SuperpixelMRF(Tighe&Lazebnik,ECCV2010) 77.5
SimultaneousMRF(Tighe&Lazebnik,ECCV2010) 77.5
RecursiveNeuralNetwork 78.1
StanfordBackgroundDataset(Gouldetal.2009)74
QCD-AwareRecursiveNeuralNetworksforJetPhysicsGillesLouppe,KyunghunCho,CyrilBecot,KyleCranmer