27
Compositionality in Recursive Neural Networks Martha Lewis ILLC University of Amsterdam SYCO3, March 2019 Oxford, UK M. Lewis Compositionality in TreeRNNs 1/25

Compositionality in Recursive Neural Networksevents.cs.bham.ac.uk/syco/3/slides/Lewis1.pdf · Compositionality in Recursive Neural Networks Martha Lewis ILLC University of Amsterdam

  • Upload
    others

  • View
    53

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Compositionality in Recursive Neural Networksevents.cs.bham.ac.uk/syco/3/slides/Lewis1.pdf · Compositionality in Recursive Neural Networks Martha Lewis ILLC University of Amsterdam

Compositionality in Recursive Neural Networks

Martha Lewis

ILLCUniversity of Amsterdam

SYCO3, March 2019

Oxford, UK

M. Lewis Compositionality in TreeRNNs 1/25

Page 2: Compositionality in Recursive Neural Networksevents.cs.bham.ac.uk/syco/3/slides/Lewis1.pdf · Compositionality in Recursive Neural Networks Martha Lewis ILLC University of Amsterdam

Outline

Compositional distributional semantics

Pregroup grammars and how to map to vector spaces

Recursive neural networks (TreeRNNs)

Mapping pregroup grammars to TreeRNNs

Implications

M. Lewis Compositionality in TreeRNNs 2/25

Page 3: Compositionality in Recursive Neural Networksevents.cs.bham.ac.uk/syco/3/slides/Lewis1.pdf · Compositionality in Recursive Neural Networks Martha Lewis ILLC University of Amsterdam

Compositional Distributional Semantics

Frege’sprin

cipleof composit

ionality

The meaningof a

complex expression

isdeterm

inedby the

meanings of itsparts

and the rules used for combining them.

M. Lewis Compositionality in TreeRNNs 3/25

Page 4: Compositionality in Recursive Neural Networksevents.cs.bham.ac.uk/syco/3/slides/Lewis1.pdf · Compositionality in Recursive Neural Networks Martha Lewis ILLC University of Amsterdam

Compositional Distributional Semantics

Frege’sprin

cipleof composit

ionality

The meaningof a

complex expression

isdeterm

inedby the

meanings of itsparts

and the rules used for combining them.

Distributional hypothesis

Words that occur in

similar contexts have similar meanings

[Harris, 1958].

M. Lewis Compositionality in TreeRNNs 3/25

Page 5: Compositionality in Recursive Neural Networksevents.cs.bham.ac.uk/syco/3/slides/Lewis1.pdf · Compositionality in Recursive Neural Networks Martha Lewis ILLC University of Amsterdam

Symbolic Structure

A pregroup algebra is a partially ordered monoid, where eachelement p has a left and a right adjoint such that:

p · pr ≤ 1 ≤ pr · p pl · p ≤ 1 ≤ p · pl

Elements of the pregroup are basic (atomic) grammaticaltypes, e.g. B = {n, s}.Atomic grammatical types can be combined to form types ofhigher order (e.g. n · nl or nr · s · nl)A sentence w1w2 . . .wn (with word wi to be of type ti ) isgrammatical whenever:

t1 · t2 · . . . · tn ≤ s

M. Lewis Compositionality in TreeRNNs 4/25

Page 6: Compositionality in Recursive Neural Networksevents.cs.bham.ac.uk/syco/3/slides/Lewis1.pdf · Compositionality in Recursive Neural Networks Martha Lewis ILLC University of Amsterdam

Pregroup derivation: example

p · pr ≤ 1 ≤ pr · p pl · p ≤ 1 ≤ p · pl

S

NP

Adj

trembling

N

shadows

VP

V

play

N

hide-and-seek

trembling shadows play hide-and-seek

n nl n nr s nl n

n · nl · n · nr · s · nl · n ≤ n · 1 · nr · s · 1= n · nr · s≤ 1 · s= s

M. Lewis Compositionality in TreeRNNs 5/25

Page 7: Compositionality in Recursive Neural Networksevents.cs.bham.ac.uk/syco/3/slides/Lewis1.pdf · Compositionality in Recursive Neural Networks Martha Lewis ILLC University of Amsterdam

Distributional Semantics

Words are represented as vectorsEntries of the vector represent how often the target wordco-occurs with the context word

iguana

cuddly

smelly

scaly

teethcute

1

10

15

7

2 scaly

cuddly

smelly

Wilbur

iguana

Similarity is given by cosine distance:

sim(v ,w) = cos(θv ,w ) =〈v ,w〉||v ||||w ||

M. Lewis Compositionality in TreeRNNs 6/25

Page 8: Compositionality in Recursive Neural Networksevents.cs.bham.ac.uk/syco/3/slides/Lewis1.pdf · Compositionality in Recursive Neural Networks Martha Lewis ILLC University of Amsterdam

The role of compositionality

Compositional distributional models

We can produce a sentence vector by composing the vectorsof the words in that sentence.

−→s = f (−→w1,−→w2, . . . ,

−→wn)

Three generic classes of CDMs:

Vector mixture models [Mitchell and Lapata (2010)]

Tensor-based models [Coecke, Sadrzadeh, Clark (2010); Baroni and

Zamparelli (2010)]

Neural models [Socher et al. (2012); Kalchbrenner et al. (2014)]

M. Lewis Compositionality in TreeRNNs 7/25

Page 9: Compositionality in Recursive Neural Networksevents.cs.bham.ac.uk/syco/3/slides/Lewis1.pdf · Compositionality in Recursive Neural Networks Martha Lewis ILLC University of Amsterdam

A multi-linear model

The grammatical type of a word defines the vector spacein which the word lives:

Nouns are vectors in N;

adjectives are linear maps N → N, i.e elements inN ⊗ N;

intransitive verbs are linear maps N → S , i.e. elementsin N ⊗ S ;

transitive verbs are bi-linear maps N ⊗ N → S , i.e.elements of N ⊗ S ⊗ N;

The composition operation is tensor contraction, i.e.elimination of matching dimensions by application of innerproduct.

Coecke, Sadrzadeh, Clarke 2010

M. Lewis Compositionality in TreeRNNs 8/25

Page 10: Compositionality in Recursive Neural Networksevents.cs.bham.ac.uk/syco/3/slides/Lewis1.pdf · Compositionality in Recursive Neural Networks Martha Lewis ILLC University of Amsterdam

Diagrammatic calculus: Summary

A

f A

V V W V W ZBmorphisms tensors

A ArA Ar A = A

Ar A

ε-map η-map (εrA ⊗ 1A) ◦ (1A ⊗ ηrA) = 1A

M. Lewis Compositionality in TreeRNNs 9/25

Page 11: Compositionality in Recursive Neural Networksevents.cs.bham.ac.uk/syco/3/slides/Lewis1.pdf · Compositionality in Recursive Neural Networks Martha Lewis ILLC University of Amsterdam

Diagrammatic calculus: example

trembling shadows play hide-and-seek

N VP

Adj N V N

S

F( ) = N N l N Nr S N l N

F(α)(trembling ⊗−−−−−→shadows ⊗ play ⊗

−−−−−−−−−→hide-and-seek)

trembling shadows play hide-and-seek

N N l N Nr S N l N

⊗i

−→wi 7→

F(α) 7→

M. Lewis Compositionality in TreeRNNs 10/25

Page 12: Compositionality in Recursive Neural Networksevents.cs.bham.ac.uk/syco/3/slides/Lewis1.pdf · Compositionality in Recursive Neural Networks Martha Lewis ILLC University of Amsterdam

Recursive Neural Networks

−→p2 = g(−−−−→Clowns,−→p1)

−−−−→Clowns

−→p1 = g(−→tell,−−→jokes)

−→tell

−−→jokes

gRNN : Rn × Rn → Rn :: (−→v1 ,−→v2) 7→ f1

(M ·

[−→v1−→v2])

gRNTN : Rn×Rn → Rn :: (−→v1 ,−→v2) 7→ gRNN(−→v1 ,−→v2)+f2(−→v1> · T · −→v2)

M. Lewis Compositionality in TreeRNNs 11/25

Page 13: Compositionality in Recursive Neural Networksevents.cs.bham.ac.uk/syco/3/slides/Lewis1.pdf · Compositionality in Recursive Neural Networks Martha Lewis ILLC University of Amsterdam

How compositional is this?

Successful

Some element of grammatical structure

The compositionality function has to do everything

Does that help us understand what’s going on?

M. Lewis Compositionality in TreeRNNs 12/25

Page 14: Compositionality in Recursive Neural Networksevents.cs.bham.ac.uk/syco/3/slides/Lewis1.pdf · Compositionality in Recursive Neural Networks Martha Lewis ILLC University of Amsterdam

Information-routing words

−−−−→Clowns

−−→who

−→tell

−−→jokes

M. Lewis Compositionality in TreeRNNs 13/25

Page 15: Compositionality in Recursive Neural Networksevents.cs.bham.ac.uk/syco/3/slides/Lewis1.pdf · Compositionality in Recursive Neural Networks Martha Lewis ILLC University of Amsterdam

Information-routing words

−−→John

−−−−−−→introduces

−−−−→himself

M. Lewis Compositionality in TreeRNNs 13/25

Page 16: Compositionality in Recursive Neural Networksevents.cs.bham.ac.uk/syco/3/slides/Lewis1.pdf · Compositionality in Recursive Neural Networks Martha Lewis ILLC University of Amsterdam

Can we map pregroup grammar onto TreeRNNs?

−→p2 = g(−−−−→Clowns,−→p1)

−−−−→Clowns

−→p1 = g(−→tell,−−→jokes)

−→tell

−−→jokes

M. Lewis Compositionality in TreeRNNs 14/25

Page 17: Compositionality in Recursive Neural Networksevents.cs.bham.ac.uk/syco/3/slides/Lewis1.pdf · Compositionality in Recursive Neural Networks Martha Lewis ILLC University of Amsterdam

Can we map pregroup grammar onto TreeRNNs?

Clowns tell jokes

gLinTen

gLinTen

−→p1 = gLinTen(−−−→cross,−−−→roads)

−→p2 = gLinTen(−−−−→Clowns,−→p1)

M. Lewis Compositionality in TreeRNNs 15/25

Page 18: Compositionality in Recursive Neural Networksevents.cs.bham.ac.uk/syco/3/slides/Lewis1.pdf · Compositionality in Recursive Neural Networks Martha Lewis ILLC University of Amsterdam

Can we map pregroup grammar onto TreeRNNs?

Clowns

tell

jokes

gLinTen

gLinTen

M. Lewis Compositionality in TreeRNNs 16/25

Page 19: Compositionality in Recursive Neural Networksevents.cs.bham.ac.uk/syco/3/slides/Lewis1.pdf · Compositionality in Recursive Neural Networks Martha Lewis ILLC University of Amsterdam

Why?

Opens up more possibilities to use tools from formalsemantics in computational linguistics.

We can immediately see possibilities for building alternativenetworks - perhaps different compositionality functions fordifferent parts of speech

Decomposing the tensors for functional words into repeatedapplications of a compositionality function gives options forlearning representations.

M. Lewis Compositionality in TreeRNNs 17/25

Page 20: Compositionality in Recursive Neural Networksevents.cs.bham.ac.uk/syco/3/slides/Lewis1.pdf · Compositionality in Recursive Neural Networks Martha Lewis ILLC University of Amsterdam

Why?

who : nrns ls

dragons breathe firewho

=

dragons breathe fire

M. Lewis Compositionality in TreeRNNs 18/25

Page 21: Compositionality in Recursive Neural Networksevents.cs.bham.ac.uk/syco/3/slides/Lewis1.pdf · Compositionality in Recursive Neural Networks Martha Lewis ILLC University of Amsterdam

Why?

himself : nsrnrrnr s

John loves himself

=

John loves

M. Lewis Compositionality in TreeRNNs 19/25

Page 22: Compositionality in Recursive Neural Networksevents.cs.bham.ac.uk/syco/3/slides/Lewis1.pdf · Compositionality in Recursive Neural Networks Martha Lewis ILLC University of Amsterdam

Experiments?

Not yet. But there are a number of avenues for exploration

Examining performance of this kind of model with standardcategorical compositional distributional models

Different compositionality functions for different word types

Testing the performance of TreeRNNs with formally analyzedinformation-routing words.

Investigating the effects of switching between word types.

Investigating meanings of logical words and quantifiers.

Extending the analysis to other types of recurrent neuralnetwork such as long short-term memory networks or gatedrecurrent units.

M. Lewis Compositionality in TreeRNNs 20/25

Page 23: Compositionality in Recursive Neural Networksevents.cs.bham.ac.uk/syco/3/slides/Lewis1.pdf · Compositionality in Recursive Neural Networks Martha Lewis ILLC University of Amsterdam

Summary

We have shown how to interpret a simplification of recursiveneural networks within a formal semantics framework

We can then analyze ‘information routing’ words such aspronouns as specific functions rather than as vectors

This also provides a simplification of tensor-based vectorcomposition architectures, reducing the number of high ordertensors to be learnt, and making representations more flexibleand reusable.

Plenty of work to do on both the experimental and thetheoretical side!

M. Lewis Compositionality in TreeRNNs 21/25

Page 24: Compositionality in Recursive Neural Networksevents.cs.bham.ac.uk/syco/3/slides/Lewis1.pdf · Compositionality in Recursive Neural Networks Martha Lewis ILLC University of Amsterdam

Thanks!

NWO Veni grant ‘Metaphorical Meanings for Artificial Agents’

M. Lewis Compositionality in TreeRNNs 22/25

Page 25: Compositionality in Recursive Neural Networksevents.cs.bham.ac.uk/syco/3/slides/Lewis1.pdf · Compositionality in Recursive Neural Networks Martha Lewis ILLC University of Amsterdam

Category-Theoretic Background

The category of pregroups Preg and the category of finitedimensional vector spaces FdVect are both compact closed

This means that they share a structure, namely:

Both have a tensor product ⊗ with a unit 1Both have adjoints Ar , Al

Both have special morphisms

εr : A⊗ Ar → 1, εl : Al ⊗ A→ 1

ηr : 1→ Ar ⊗ A, ηl : 1→ A⊗ Al

These morphisms interact in a certain way.

In Preg:

p · pr ≤ 1 ≤ pr · p pl · p ≤ 1 ≤ p · pl

M. Lewis Compositionality in TreeRNNs 23/25

Page 26: Compositionality in Recursive Neural Networksevents.cs.bham.ac.uk/syco/3/slides/Lewis1.pdf · Compositionality in Recursive Neural Networks Martha Lewis ILLC University of Amsterdam

A functor from syntax to semantics

We define a functor F : Preg→ FdVect such that:

F(p) = P ∀p ∈ BF(1) = R

F(p · q) = F(p)⊗F(q)

F(pr ) = F(pl) = F(p)

F(p ≤ q) = F(p)→ F(q)

F(εr ) = F(εl) = inner product in FdVect

F(ηr ) = F(ηl) = identity maps in FdVect

[Kartsaklis, Sadrzadeh, Pulman and Coecke, 2016]

M. Lewis Compositionality in TreeRNNs 24/25

Page 27: Compositionality in Recursive Neural Networksevents.cs.bham.ac.uk/syco/3/slides/Lewis1.pdf · Compositionality in Recursive Neural Networks Martha Lewis ILLC University of Amsterdam

References I

M. Lewis Compositionality in TreeRNNs 25/25