Upload
others
View
53
Download
0
Embed Size (px)
Citation preview
Compositionality in Recursive Neural Networks
Martha Lewis
ILLCUniversity of Amsterdam
SYCO3, March 2019
Oxford, UK
M. Lewis Compositionality in TreeRNNs 1/25
Outline
Compositional distributional semantics
Pregroup grammars and how to map to vector spaces
Recursive neural networks (TreeRNNs)
Mapping pregroup grammars to TreeRNNs
Implications
M. Lewis Compositionality in TreeRNNs 2/25
Compositional Distributional Semantics
Frege’sprin
cipleof composit
ionality
The meaningof a
complex expression
isdeterm
inedby the
meanings of itsparts
and the rules used for combining them.
M. Lewis Compositionality in TreeRNNs 3/25
Compositional Distributional Semantics
Frege’sprin
cipleof composit
ionality
The meaningof a
complex expression
isdeterm
inedby the
meanings of itsparts
and the rules used for combining them.
Distributional hypothesis
Words that occur in
similar contexts have similar meanings
[Harris, 1958].
M. Lewis Compositionality in TreeRNNs 3/25
Symbolic Structure
A pregroup algebra is a partially ordered monoid, where eachelement p has a left and a right adjoint such that:
p · pr ≤ 1 ≤ pr · p pl · p ≤ 1 ≤ p · pl
Elements of the pregroup are basic (atomic) grammaticaltypes, e.g. B = {n, s}.Atomic grammatical types can be combined to form types ofhigher order (e.g. n · nl or nr · s · nl)A sentence w1w2 . . .wn (with word wi to be of type ti ) isgrammatical whenever:
t1 · t2 · . . . · tn ≤ s
M. Lewis Compositionality in TreeRNNs 4/25
Pregroup derivation: example
p · pr ≤ 1 ≤ pr · p pl · p ≤ 1 ≤ p · pl
S
NP
Adj
trembling
N
shadows
VP
V
play
N
hide-and-seek
trembling shadows play hide-and-seek
n nl n nr s nl n
n · nl · n · nr · s · nl · n ≤ n · 1 · nr · s · 1= n · nr · s≤ 1 · s= s
M. Lewis Compositionality in TreeRNNs 5/25
Distributional Semantics
Words are represented as vectorsEntries of the vector represent how often the target wordco-occurs with the context word
iguana
cuddly
smelly
scaly
teethcute
1
10
15
7
2 scaly
cuddly
smelly
Wilbur
iguana
Similarity is given by cosine distance:
sim(v ,w) = cos(θv ,w ) =〈v ,w〉||v ||||w ||
M. Lewis Compositionality in TreeRNNs 6/25
The role of compositionality
Compositional distributional models
We can produce a sentence vector by composing the vectorsof the words in that sentence.
−→s = f (−→w1,−→w2, . . . ,
−→wn)
Three generic classes of CDMs:
Vector mixture models [Mitchell and Lapata (2010)]
Tensor-based models [Coecke, Sadrzadeh, Clark (2010); Baroni and
Zamparelli (2010)]
Neural models [Socher et al. (2012); Kalchbrenner et al. (2014)]
M. Lewis Compositionality in TreeRNNs 7/25
A multi-linear model
The grammatical type of a word defines the vector spacein which the word lives:
Nouns are vectors in N;
adjectives are linear maps N → N, i.e elements inN ⊗ N;
intransitive verbs are linear maps N → S , i.e. elementsin N ⊗ S ;
transitive verbs are bi-linear maps N ⊗ N → S , i.e.elements of N ⊗ S ⊗ N;
The composition operation is tensor contraction, i.e.elimination of matching dimensions by application of innerproduct.
Coecke, Sadrzadeh, Clarke 2010
M. Lewis Compositionality in TreeRNNs 8/25
Diagrammatic calculus: Summary
A
f A
V V W V W ZBmorphisms tensors
A ArA Ar A = A
Ar A
ε-map η-map (εrA ⊗ 1A) ◦ (1A ⊗ ηrA) = 1A
M. Lewis Compositionality in TreeRNNs 9/25
Diagrammatic calculus: example
trembling shadows play hide-and-seek
N VP
Adj N V N
S
F( ) = N N l N Nr S N l N
F(α)(trembling ⊗−−−−−→shadows ⊗ play ⊗
−−−−−−−−−→hide-and-seek)
trembling shadows play hide-and-seek
N N l N Nr S N l N
⊗i
−→wi 7→
F(α) 7→
M. Lewis Compositionality in TreeRNNs 10/25
Recursive Neural Networks
−→p2 = g(−−−−→Clowns,−→p1)
−−−−→Clowns
−→p1 = g(−→tell,−−→jokes)
−→tell
−−→jokes
gRNN : Rn × Rn → Rn :: (−→v1 ,−→v2) 7→ f1
(M ·
[−→v1−→v2])
gRNTN : Rn×Rn → Rn :: (−→v1 ,−→v2) 7→ gRNN(−→v1 ,−→v2)+f2(−→v1> · T · −→v2)
M. Lewis Compositionality in TreeRNNs 11/25
How compositional is this?
Successful
Some element of grammatical structure
The compositionality function has to do everything
Does that help us understand what’s going on?
M. Lewis Compositionality in TreeRNNs 12/25
Information-routing words
−−−−→Clowns
−−→who
−→tell
−−→jokes
M. Lewis Compositionality in TreeRNNs 13/25
Information-routing words
−−→John
−−−−−−→introduces
−−−−→himself
M. Lewis Compositionality in TreeRNNs 13/25
Can we map pregroup grammar onto TreeRNNs?
−→p2 = g(−−−−→Clowns,−→p1)
−−−−→Clowns
−→p1 = g(−→tell,−−→jokes)
−→tell
−−→jokes
M. Lewis Compositionality in TreeRNNs 14/25
Can we map pregroup grammar onto TreeRNNs?
Clowns tell jokes
gLinTen
gLinTen
−→p1 = gLinTen(−−−→cross,−−−→roads)
−→p2 = gLinTen(−−−−→Clowns,−→p1)
M. Lewis Compositionality in TreeRNNs 15/25
Can we map pregroup grammar onto TreeRNNs?
Clowns
tell
jokes
gLinTen
gLinTen
M. Lewis Compositionality in TreeRNNs 16/25
Why?
Opens up more possibilities to use tools from formalsemantics in computational linguistics.
We can immediately see possibilities for building alternativenetworks - perhaps different compositionality functions fordifferent parts of speech
Decomposing the tensors for functional words into repeatedapplications of a compositionality function gives options forlearning representations.
M. Lewis Compositionality in TreeRNNs 17/25
Why?
who : nrns ls
dragons breathe firewho
=
dragons breathe fire
M. Lewis Compositionality in TreeRNNs 18/25
Why?
himself : nsrnrrnr s
John loves himself
=
John loves
M. Lewis Compositionality in TreeRNNs 19/25
Experiments?
Not yet. But there are a number of avenues for exploration
Examining performance of this kind of model with standardcategorical compositional distributional models
Different compositionality functions for different word types
Testing the performance of TreeRNNs with formally analyzedinformation-routing words.
Investigating the effects of switching between word types.
Investigating meanings of logical words and quantifiers.
Extending the analysis to other types of recurrent neuralnetwork such as long short-term memory networks or gatedrecurrent units.
M. Lewis Compositionality in TreeRNNs 20/25
Summary
We have shown how to interpret a simplification of recursiveneural networks within a formal semantics framework
We can then analyze ‘information routing’ words such aspronouns as specific functions rather than as vectors
This also provides a simplification of tensor-based vectorcomposition architectures, reducing the number of high ordertensors to be learnt, and making representations more flexibleand reusable.
Plenty of work to do on both the experimental and thetheoretical side!
M. Lewis Compositionality in TreeRNNs 21/25
Thanks!
NWO Veni grant ‘Metaphorical Meanings for Artificial Agents’
M. Lewis Compositionality in TreeRNNs 22/25
Category-Theoretic Background
The category of pregroups Preg and the category of finitedimensional vector spaces FdVect are both compact closed
This means that they share a structure, namely:
Both have a tensor product ⊗ with a unit 1Both have adjoints Ar , Al
Both have special morphisms
εr : A⊗ Ar → 1, εl : Al ⊗ A→ 1
ηr : 1→ Ar ⊗ A, ηl : 1→ A⊗ Al
These morphisms interact in a certain way.
In Preg:
p · pr ≤ 1 ≤ pr · p pl · p ≤ 1 ≤ p · pl
M. Lewis Compositionality in TreeRNNs 23/25
A functor from syntax to semantics
We define a functor F : Preg→ FdVect such that:
F(p) = P ∀p ∈ BF(1) = R
F(p · q) = F(p)⊗F(q)
F(pr ) = F(pl) = F(p)
F(p ≤ q) = F(p)→ F(q)
F(εr ) = F(εl) = inner product in FdVect
F(ηr ) = F(ηl) = identity maps in FdVect
[Kartsaklis, Sadrzadeh, Pulman and Coecke, 2016]
M. Lewis Compositionality in TreeRNNs 24/25
References I
M. Lewis Compositionality in TreeRNNs 25/25