The Lexicon: An Interdisciplinary Introductionesslli2018.folli.info/wp-content/uploads/ESSLLI... ·...

Preview:

Citation preview

1/38

The Lexicon: An Interdisciplinary Introduction

Elisabetta JezekUniversity of Pavia

August, 6-10, 2018ESSLLI 2018

St. Kliment Ohridski UniversitySofia, Bulgaria

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

2/38

Course Outline

Lecture 1. Basics on lexicon, word types, and concept-wordmapping.

Lecture 2. The global structure of the Lexicon: word classesand word relations.

Lecture 3. Varieties of linguistic evidence in favour ofcontext-sensitive models of lexical meaning.

Lecture 4. Lexical information and its interplay with cognitionand pragmatic inference.

Lecture 5. The meaning of verbs and its representation incompositional vector space models.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

3/38

The distributional semantics framework

The distributional hypothesis is an hypothesis on the nature oflinguistic meaning.

It is based on the methodology of linguistic analysis developedin the 1950s by the American linguist Z. Harris (Harris 1956)and in parallel work carried out in British lexicology by J.R.Firth (Firth 1957).

It maintains that the meaning of a word correlates with itsdistribution, i.e. the set of contexts in which it occurs.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

3/38

The distributional semantics framework

The distributional hypothesis is an hypothesis on the nature oflinguistic meaning.

It is based on the methodology of linguistic analysis developedin the 1950s by the American linguist Z. Harris (Harris 1956)and in parallel work carried out in British lexicology by J.R.Firth (Firth 1957).

It maintains that the meaning of a word correlates with itsdistribution, i.e. the set of contexts in which it occurs.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

3/38

The distributional semantics framework

The distributional hypothesis is an hypothesis on the nature oflinguistic meaning.

It is based on the methodology of linguistic analysis developedin the 1950s by the American linguist Z. Harris (Harris 1956)and in parallel work carried out in British lexicology by J.R.Firth (Firth 1957).

It maintains that the meaning of a word correlates with itsdistribution, i.e. the set of contexts in which it occurs.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

4/38

The distributional semantic framework

The hypothesis predicts that it is possible to pin down themeaning of words in a relational fashion (that is, themeaning of one word with respect to the meaning of oneor more other words) by comparing the set of contexts inwhich these words occur.

Words that are observed to have similar co-occurrence profilesare likely to be semantically related in some ways (Harris,1954; Sahlgren, 2008).

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

4/38

The distributional semantic framework

The hypothesis predicts that it is possible to pin down themeaning of words in a relational fashion (that is, themeaning of one word with respect to the meaning of oneor more other words) by comparing the set of contexts inwhich these words occur.

Words that are observed to have similar co-occurrence profilesare likely to be semantically related in some ways (Harris,1954; Sahlgren, 2008).

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

5/38

Building Distributional Semantic Models (DSMs)

The distributional hypothesis is used as the guiding principleto build computational models of meaning representation,called the distributional semantic models (word-spacemodels in Sahlgren 2006) based on large-scale corpora.

Researchers use different techniques based on thedistributional analysis to translate the contextual behavior ofwords into representations of their meaning.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

5/38

Building Distributional Semantic Models (DSMs)

The distributional hypothesis is used as the guiding principleto build computational models of meaning representation,called the distributional semantic models (word-spacemodels in Sahlgren 2006) based on large-scale corpora.

Researchers use different techniques based on thedistributional analysis to translate the contextual behavior ofwords into representations of their meaning.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

6/38

Geometric metaphor of meaning

DSM are based on a complex metaphor of meaning.

Meanings are locations in a semantic space, and semanticsimilarity is proximity between the locations (Sahlgren 2006,19).

The key notion to go from distributional information to ageometric representation of meaning is the notion of vector.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

6/38

Geometric metaphor of meaning

DSM are based on a complex metaphor of meaning.

Meanings are locations in a semantic space, and semanticsimilarity is proximity between the locations (Sahlgren 2006,19).

The key notion to go from distributional information to ageometric representation of meaning is the notion of vector.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

6/38

Geometric metaphor of meaning

DSM are based on a complex metaphor of meaning.

Meanings are locations in a semantic space, and semanticsimilarity is proximity between the locations (Sahlgren 2006,19).

The key notion to go from distributional information to ageometric representation of meaning is the notion of vector.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

7/38

Context as Co-occurrence Matrixes

The sum of a word’s contexts is tabulated in a matrix ofco-occurrence and thereby transformed into a vector with ndimensions.

Each dimension records the number of times a word occurs ina given context (where context stands for another word, butcan also be a region of text or a whole document).

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

7/38

Context as Co-occurrence Matrixes

The sum of a word’s contexts is tabulated in a matrix ofco-occurrence and thereby transformed into a vector with ndimensions.

Each dimension records the number of times a word occurs ina given context (where context stands for another word, butcan also be a region of text or a whole document).

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

8/38

Context as Co-occurrence Matrixes

In the Figure below, dog, cat, and car are target words andruns and legs the context.

Context (or distributional) vectors are defined as the rows ofthe matrix. The co- occurrence-count list for the target worddog is (1; 4); that for cat is (1; 5) etc.

Such ordered lists of numbers are vectors.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

8/38

Context as Co-occurrence Matrixes

In the Figure below, dog, cat, and car are target words andruns and legs the context.

Context (or distributional) vectors are defined as the rows ofthe matrix. The co- occurrence-count list for the target worddog is (1; 4); that for cat is (1; 5) etc.

Such ordered lists of numbers are vectors.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

9/38

Semantic space and Vectors

Example of a 2-dimensional vector space representation withthree vectors v1 = (1; 5); v2 = (1; 4), and v3 = (4; 0).

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

9/38

Semantic space and Vectors

Example of a 2-dimensional vector space representation withthree vectors v1 = (1; 5); v2 = (1; 4), and v3 = (4; 0).

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

10/38

Semantic similarity as proximity between vectors

Proximity between word vectors - that represent distribution -is taken as an index of meaning similarity.

Vector cosine is generally adopted to measure such proximity,even though other measures have been proposed (Weeds et al2004).

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

10/38

Semantic similarity as proximity between vectors

Proximity between word vectors - that represent distribution -is taken as an index of meaning similarity.

Vector cosine is generally adopted to measure such proximity,even though other measures have been proposed (Weeds et al2004).

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

10/38

Semantic similarity as proximity between vectors

Proximity between word vectors - that represent distribution -is taken as an index of meaning similarity.

Vector cosine is generally adopted to measure such proximity,even though other measures have been proposed (Weeds et al2004).

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

11/38

Semantic relatedness

The relationship most typically modelled is general semanticrelatedness, as opposed to more precise indications of, forinstance, similarity (Hill et al., 2015).

Distributional semantic models have been effectively appliedto tasks ranging from language modelling (Bengio, 2009) tometaphor classification (Gutirrez et al., 2016) and theextrapolation of more fine-grained correspondences betweenconcepts (Derrac and Schockaert, 2015).

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

11/38

Semantic relatedness

The relationship most typically modelled is general semanticrelatedness, as opposed to more precise indications of, forinstance, similarity (Hill et al., 2015).

Distributional semantic models have been effectively appliedto tasks ranging from language modelling (Bengio, 2009) tometaphor classification (Gutirrez et al., 2016) and theextrapolation of more fine-grained correspondences betweenconcepts (Derrac and Schockaert, 2015).

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

12/38

The meaning of verbs and their representation in avector-based model of compositionality

Three components in verb’s denotation, which togetherconstitute different aspect of the same object, i.e. aneventuality.

Time and time structure.

Argumenthood.

Inherent Meaning.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

12/38

The meaning of verbs and their representation in avector-based model of compositionality

Three components in verb’s denotation, which togetherconstitute different aspect of the same object, i.e. aneventuality.

Time and time structure.

Argumenthood.

Inherent Meaning.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

12/38

The meaning of verbs and their representation in avector-based model of compositionality

Three components in verb’s denotation, which togetherconstitute different aspect of the same object, i.e. aneventuality.

Time and time structure.

Argumenthood.

Inherent Meaning.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

12/38

The meaning of verbs and their representation in avector-based model of compositionality

Three components in verb’s denotation, which togetherconstitute different aspect of the same object, i.e. aneventuality.

Time and time structure.

Argumenthood.

Inherent Meaning.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

13/38

Time and time structure

The denotation of a verb is an eventuality, and an eventualityis located and structured in time.

states (to own)

processes (to work, to sleep)

punctual events (to find, to arrive)

degree achievements (to ripen), incremental theme verbs (tofill), multi-scalar verbs (to increase)

semelfactives or points (to cough, to knock)

The Vendler-Dowty taxonomy, Smith 1991.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

13/38

Time and time structure

The denotation of a verb is an eventuality, and an eventualityis located and structured in time.

states (to own)

processes (to work, to sleep)

punctual events (to find, to arrive)

degree achievements (to ripen), incremental theme verbs (tofill), multi-scalar verbs (to increase)

semelfactives or points (to cough, to knock)

The Vendler-Dowty taxonomy, Smith 1991.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

13/38

Time and time structure

The denotation of a verb is an eventuality, and an eventualityis located and structured in time.

states (to own)

processes (to work, to sleep)

punctual events (to find, to arrive)

degree achievements (to ripen), incremental theme verbs (tofill), multi-scalar verbs (to increase)

semelfactives or points (to cough, to knock)

The Vendler-Dowty taxonomy, Smith 1991.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

13/38

Time and time structure

The denotation of a verb is an eventuality, and an eventualityis located and structured in time.

states (to own)

processes (to work, to sleep)

punctual events (to find, to arrive)

degree achievements (to ripen), incremental theme verbs (tofill), multi-scalar verbs (to increase)

semelfactives or points (to cough, to knock)

The Vendler-Dowty taxonomy, Smith 1991.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

13/38

Time and time structure

The denotation of a verb is an eventuality, and an eventualityis located and structured in time.

states (to own)

processes (to work, to sleep)

punctual events (to find, to arrive)

degree achievements (to ripen), incremental theme verbs (tofill), multi-scalar verbs (to increase)

semelfactives or points (to cough, to knock)

The Vendler-Dowty taxonomy, Smith 1991.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

13/38

Time and time structure

The denotation of a verb is an eventuality, and an eventualityis located and structured in time.

states (to own)

processes (to work, to sleep)

punctual events (to find, to arrive)

degree achievements (to ripen), incremental theme verbs (tofill), multi-scalar verbs (to increase)

semelfactives or points (to cough, to knock)

The Vendler-Dowty taxonomy, Smith 1991.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

14/38

Event Structure

Events may be complex, i.e. they may be include subevents(Parsons 1990, Pustejovsky 1991).

Causal, temporal relations between event parts (subevents).

Kill includes an action (cause die) which precedes a necessaryeffect (be dead).

Show includes an action (make visible) which is followed by alikely effect (see).

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

14/38

Event Structure

Events may be complex, i.e. they may be include subevents(Parsons 1990, Pustejovsky 1991).

Causal, temporal relations between event parts (subevents).

Kill includes an action (cause die) which precedes a necessaryeffect (be dead).

Show includes an action (make visible) which is followed by alikely effect (see).

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

14/38

Event Structure

Events may be complex, i.e. they may be include subevents(Parsons 1990, Pustejovsky 1991).

Causal, temporal relations between event parts (subevents).

Kill includes an action (cause die) which precedes a necessaryeffect (be dead).

Show includes an action (make visible) which is followed by alikely effect (see).

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

14/38

Event Structure

Events may be complex, i.e. they may be include subevents(Parsons 1990, Pustejovsky 1991).

Causal, temporal relations between event parts (subevents).

Kill includes an action (cause die) which precedes a necessaryeffect (be dead).

Show includes an action (make visible) which is followed by alikely effect (see).

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

14/38

Event Structure

Events may be complex, i.e. they may be include subevents(Parsons 1990, Pustejovsky 1991).

Causal, temporal relations between event parts (subevents).

Kill includes an action (cause die) which precedes a necessaryeffect (be dead).

Show includes an action (make visible) which is followed by alikely effect (see).

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

15/38

Event Structure for kill and show

(1) kill.v

e

<������

HHHHHH

e2

The animals are dead

e1

The fire killed the animals

(2) break.v

e

<������

HHHH

HHe2

The film is seen

e1

The festival showed the filmThe film is shown

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

16/38

Argumenthood

Argumenthood is the property of requiring (a certain numberand type of) slots to express the grammatically relevantparticipants in the event.

?no argument, only predication (to snow, to rain)

one argument (to run, to swim)

two arguments (to know, to participate)

three arguments (to put, to give)

Levin Rappaport 2005, Argument Realization.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

16/38

Argumenthood

Argumenthood is the property of requiring (a certain numberand type of) slots to express the grammatically relevantparticipants in the event.

?no argument, only predication (to snow, to rain)

one argument (to run, to swim)

two arguments (to know, to participate)

three arguments (to put, to give)

Levin Rappaport 2005, Argument Realization.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

16/38

Argumenthood

Argumenthood is the property of requiring (a certain numberand type of) slots to express the grammatically relevantparticipants in the event.

?no argument, only predication (to snow, to rain)

one argument (to run, to swim)

two arguments (to know, to participate)

three arguments (to put, to give)

Levin Rappaport 2005, Argument Realization.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

16/38

Argumenthood

Argumenthood is the property of requiring (a certain numberand type of) slots to express the grammatically relevantparticipants in the event.

?no argument, only predication (to snow, to rain)

one argument (to run, to swim)

two arguments (to know, to participate)

three arguments (to put, to give)

Levin Rappaport 2005, Argument Realization.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

16/38

Argumenthood

Argumenthood is the property of requiring (a certain numberand type of) slots to express the grammatically relevantparticipants in the event.

?no argument, only predication (to snow, to rain)

one argument (to run, to swim)

two arguments (to know, to participate)

three arguments (to put, to give)

Levin Rappaport 2005, Argument Realization.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

16/38

Argumenthood

Argumenthood is the property of requiring (a certain numberand type of) slots to express the grammatically relevantparticipants in the event.

?no argument, only predication (to snow, to rain)

one argument (to run, to swim)

two arguments (to know, to participate)

three arguments (to put, to give)

Levin Rappaport 2005, Argument Realization.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

17/38

Argumenthood

Arguments have properties.

Thematic (or semantic) role.

Semantic Type.

Syntactic Realization.

Levin Rappaport 2005, Argument Realization.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

17/38

Argumenthood

Arguments have properties.

Thematic (or semantic) role.

Semantic Type.

Syntactic Realization.

Levin Rappaport 2005, Argument Realization.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

17/38

Argumenthood

Arguments have properties.

Thematic (or semantic) role.

Semantic Type.

Syntactic Realization.

Levin Rappaport 2005, Argument Realization.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

17/38

Argumenthood

Arguments have properties.

Thematic (or semantic) role.

Semantic Type.

Syntactic Realization.

Levin Rappaport 2005, Argument Realization.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

17/38

Argumenthood

Arguments have properties.

Thematic (or semantic) role.

Semantic Type.

Syntactic Realization.

Levin Rappaport 2005, Argument Realization.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

18/38

Thematic (or semantic) rolesJezek 2016 The Lexicon

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

19/38

Dynamic Argument Structure for buildJezek and Pustejovsky 2017, Dynamic Argument Structure

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

build .v

argstr =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

d-arg:resource = exist(material)h-arg1: resource = mod(material)h-arg2: result = init(artifact)t-arg:result = exist(artifact)

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

20/38

Event Structure with Dynamic Argument Annotation for tobuildJezek and Pustejovsky 2017, Dynamic Argument Structure

(3) build .vHHHHH

�����

e1 -init( result ∶ artifact)

e2HHHHH

�����

e11

exist(resource ∶ material)

mod( resource ∶ material)e1k- exist(result ∶ artifact)

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

21/38

Inherent Meaning

Events are complex arrays of semantic properties, and verbsand verb classes encode them capitalising on specific ones.

motion verbs (go, walk, climb, ?sit)

manner verbs (wipe, scrub)

perception verbs (see, smell, hear, llisten)

verbs of cognition (understand, grasp)

verbs of communication (talk, tell, whisper)

verbs expressing measures (cost, weigh)

Event Ontologies.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

21/38

Inherent Meaning

Events are complex arrays of semantic properties, and verbsand verb classes encode them capitalising on specific ones.

motion verbs (go, walk, climb, ?sit)

manner verbs (wipe, scrub)

perception verbs (see, smell, hear, llisten)

verbs of cognition (understand, grasp)

verbs of communication (talk, tell, whisper)

verbs expressing measures (cost, weigh)

Event Ontologies.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

21/38

Inherent Meaning

Events are complex arrays of semantic properties, and verbsand verb classes encode them capitalising on specific ones.

motion verbs (go, walk, climb, ?sit)

manner verbs (wipe, scrub)

perception verbs (see, smell, hear, llisten)

verbs of cognition (understand, grasp)

verbs of communication (talk, tell, whisper)

verbs expressing measures (cost, weigh)

Event Ontologies.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

21/38

Inherent Meaning

Events are complex arrays of semantic properties, and verbsand verb classes encode them capitalising on specific ones.

motion verbs (go, walk, climb, ?sit)

manner verbs (wipe, scrub)

perception verbs (see, smell, hear, llisten)

verbs of cognition (understand, grasp)

verbs of communication (talk, tell, whisper)

verbs expressing measures (cost, weigh)

Event Ontologies.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

21/38

Inherent Meaning

Events are complex arrays of semantic properties, and verbsand verb classes encode them capitalising on specific ones.

motion verbs (go, walk, climb, ?sit)

manner verbs (wipe, scrub)

perception verbs (see, smell, hear, llisten)

verbs of cognition (understand, grasp)

verbs of communication (talk, tell, whisper)

verbs expressing measures (cost, weigh)

Event Ontologies.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

21/38

Inherent Meaning

Events are complex arrays of semantic properties, and verbsand verb classes encode them capitalising on specific ones.

motion verbs (go, walk, climb, ?sit)

manner verbs (wipe, scrub)

perception verbs (see, smell, hear, llisten)

verbs of cognition (understand, grasp)

verbs of communication (talk, tell, whisper)

verbs expressing measures (cost, weigh)

Event Ontologies.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

21/38

Inherent Meaning

Events are complex arrays of semantic properties, and verbsand verb classes encode them capitalising on specific ones.

motion verbs (go, walk, climb, ?sit)

manner verbs (wipe, scrub)

perception verbs (see, smell, hear, llisten)

verbs of cognition (understand, grasp)

verbs of communication (talk, tell, whisper)

verbs expressing measures (cost, weigh)

Event Ontologies.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

21/38

Inherent Meaning

Events are complex arrays of semantic properties, and verbsand verb classes encode them capitalising on specific ones.

motion verbs (go, walk, climb, ?sit)

manner verbs (wipe, scrub)

perception verbs (see, smell, hear, llisten)

verbs of cognition (understand, grasp)

verbs of communication (talk, tell, whisper)

verbs expressing measures (cost, weigh)

Event Ontologies.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

22/38

Co-composition

Verb meaning is co-dependent on those of its arguments.

take a tablet ∣ a train.

like pizza ∣ my sister.

open the door ∣ a letter ∣ a restaurant ∣ a bank account ∣ adebate.

Formal mechanism of co-composition in Pustejovsky 1995.

Cf. “intersective method of combination is well-known to fail in many

cases”, Baroni and Zamparelli 2010.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

22/38

Co-composition

Verb meaning is co-dependent on those of its arguments.

take a tablet ∣ a train.

like pizza ∣ my sister.

open the door ∣ a letter ∣ a restaurant ∣ a bank account ∣ adebate.

Formal mechanism of co-composition in Pustejovsky 1995.

Cf. “intersective method of combination is well-known to fail in many

cases”, Baroni and Zamparelli 2010.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

22/38

Co-composition

Verb meaning is co-dependent on those of its arguments.

take a tablet ∣ a train.

like pizza ∣ my sister.

open the door ∣ a letter ∣ a restaurant ∣ a bank account ∣ adebate.

Formal mechanism of co-composition in Pustejovsky 1995.

Cf. “intersective method of combination is well-known to fail in many

cases”, Baroni and Zamparelli 2010.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

22/38

Co-composition

Verb meaning is co-dependent on those of its arguments.

take a tablet ∣ a train.

like pizza ∣ my sister.

open the door ∣ a letter ∣ a restaurant ∣ a bank account ∣ adebate.

Formal mechanism of co-composition in Pustejovsky 1995.

Cf. “intersective method of combination is well-known to fail in many

cases”, Baroni and Zamparelli 2010.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

22/38

Co-composition

Verb meaning is co-dependent on those of its arguments.

take a tablet ∣ a train.

like pizza ∣ my sister.

open the door ∣ a letter ∣ a restaurant ∣ a bank account ∣ adebate.

Formal mechanism of co-composition in Pustejovsky 1995.

Cf. “intersective method of combination is well-known to fail in many

cases”, Baroni and Zamparelli 2010.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

22/38

Co-composition

Verb meaning is co-dependent on those of its arguments.

take a tablet ∣ a train.

like pizza ∣ my sister.

open the door ∣ a letter ∣ a restaurant ∣ a bank account ∣ adebate.

Formal mechanism of co-composition in Pustejovsky 1995.

Cf. “intersective method of combination is well-known to fail in many

cases”, Baroni and Zamparelli 2010.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

23/38

Verbs as tensors

Function application captures one aspect of a verb semantics,i.e. its relational aspect.

The verb introduces a whole event.

How do we capture an event in vector-space models?

Corpus-harvested vectors encoding aspectual/temporalproperties?

Build vectors/conceptual spaces representative of the typerestrictions verb place on their arguments?

Ponti, Jezek and Magnini 2016. Grounding the lexical sets of

causative/inchoative verbs with word embedding. DSALT at ESSLLI

2016.

Cf. Grefenstette and Sadrzadeh 2015, McGregor, Purver and Wiggins

2015.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

23/38

Verbs as tensors

Function application captures one aspect of a verb semantics,i.e. its relational aspect.

The verb introduces a whole event.

How do we capture an event in vector-space models?

Corpus-harvested vectors encoding aspectual/temporalproperties?

Build vectors/conceptual spaces representative of the typerestrictions verb place on their arguments?

Ponti, Jezek and Magnini 2016. Grounding the lexical sets of

causative/inchoative verbs with word embedding. DSALT at ESSLLI

2016.

Cf. Grefenstette and Sadrzadeh 2015, McGregor, Purver and Wiggins

2015.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

23/38

Verbs as tensors

Function application captures one aspect of a verb semantics,i.e. its relational aspect.

The verb introduces a whole event.

How do we capture an event in vector-space models?

Corpus-harvested vectors encoding aspectual/temporalproperties?

Build vectors/conceptual spaces representative of the typerestrictions verb place on their arguments?

Ponti, Jezek and Magnini 2016. Grounding the lexical sets of

causative/inchoative verbs with word embedding. DSALT at ESSLLI

2016.

Cf. Grefenstette and Sadrzadeh 2015, McGregor, Purver and Wiggins

2015.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

23/38

Verbs as tensors

Function application captures one aspect of a verb semantics,i.e. its relational aspect.

The verb introduces a whole event.

How do we capture an event in vector-space models?

Corpus-harvested vectors encoding aspectual/temporalproperties?

Build vectors/conceptual spaces representative of the typerestrictions verb place on their arguments?

Ponti, Jezek and Magnini 2016. Grounding the lexical sets of

causative/inchoative verbs with word embedding. DSALT at ESSLLI

2016.

Cf. Grefenstette and Sadrzadeh 2015, McGregor, Purver and Wiggins

2015.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

23/38

Verbs as tensors

Function application captures one aspect of a verb semantics,i.e. its relational aspect.

The verb introduces a whole event.

How do we capture an event in vector-space models?

Corpus-harvested vectors encoding aspectual/temporalproperties?

Build vectors/conceptual spaces representative of the typerestrictions verb place on their arguments?

Ponti, Jezek and Magnini 2016. Grounding the lexical sets of

causative/inchoative verbs with word embedding. DSALT at ESSLLI

2016.

Cf. Grefenstette and Sadrzadeh 2015, McGregor, Purver and Wiggins

2015.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

23/38

Verbs as tensors

Function application captures one aspect of a verb semantics,i.e. its relational aspect.

The verb introduces a whole event.

How do we capture an event in vector-space models?

Corpus-harvested vectors encoding aspectual/temporalproperties?

Build vectors/conceptual spaces representative of the typerestrictions verb place on their arguments?

Ponti, Jezek and Magnini 2016. Grounding the lexical sets of

causative/inchoative verbs with word embedding. DSALT at ESSLLI

2016.

Cf. Grefenstette and Sadrzadeh 2015, McGregor, Purver and Wiggins

2015.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

23/38

Verbs as tensors

Function application captures one aspect of a verb semantics,i.e. its relational aspect.

The verb introduces a whole event.

How do we capture an event in vector-space models?

Corpus-harvested vectors encoding aspectual/temporalproperties?

Build vectors/conceptual spaces representative of the typerestrictions verb place on their arguments?

Ponti, Jezek and Magnini 2016. Grounding the lexical sets of

causative/inchoative verbs with word embedding. DSALT at ESSLLI

2016.

Cf. Grefenstette and Sadrzadeh 2015, McGregor, Purver and Wiggins

2015.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

24/38

Building Verb Vectors

In distributional semantics, verbs have sometimes beenrepresented through vectors built on verbs themselves or onthe sum of the arguments vectors (see for the latterGrefenstette and Sadrzadeh, 2011; Kartsaklis and Sadrzadeh,2013).

In Chersoni et al 2016 syntactic joint contexts are used,abstracting from linear word windows by using dependencies.

Assuming a word window of 3 and using dependencies, a verblike to love is then represented by a collection of contextssuch as:

“the new students the school campus”.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

24/38

Building Verb Vectors

In distributional semantics, verbs have sometimes beenrepresented through vectors built on verbs themselves or onthe sum of the arguments vectors (see for the latterGrefenstette and Sadrzadeh, 2011; Kartsaklis and Sadrzadeh,2013).

In Chersoni et al 2016 syntactic joint contexts are used,abstracting from linear word windows by using dependencies.

Assuming a word window of 3 and using dependencies, a verblike to love is then represented by a collection of contextssuch as:

“the new students the school campus”.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

24/38

Building Verb Vectors

In distributional semantics, verbs have sometimes beenrepresented through vectors built on verbs themselves or onthe sum of the arguments vectors (see for the latterGrefenstette and Sadrzadeh, 2011; Kartsaklis and Sadrzadeh,2013).

In Chersoni et al 2016 syntactic joint contexts are used,abstracting from linear word windows by using dependencies.

Assuming a word window of 3 and using dependencies, a verblike to love is then represented by a collection of contextssuch as:

“the new students the school campus”.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

24/38

Building Verb Vectors

In distributional semantics, verbs have sometimes beenrepresented through vectors built on verbs themselves or onthe sum of the arguments vectors (see for the latterGrefenstette and Sadrzadeh, 2011; Kartsaklis and Sadrzadeh,2013).

In Chersoni et al 2016 syntactic joint contexts are used,abstracting from linear word windows by using dependencies.

Assuming a word window of 3 and using dependencies, a verblike to love is then represented by a collection of contextssuch as:

“the new students the school campus”.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

25/38

Augmented Verb Vectors

In Blundell, Sadrzadeh and Jezek 2017, we propose to buildvectors for verbs by augmenting the verb vector with thevector of the argument(s), with the goal of providing a betterdistributional representation for the verb itself.

We base our proposal on the theoretical assumption thatargument structure is part of the meaning of the verband not external to it (Pustejovsky, 1995, Van Valin, 2005,Levin and Rappaport Hovav, 2005).

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

25/38

Augmented Verb Vectors

In Blundell, Sadrzadeh and Jezek 2017, we propose to buildvectors for verbs by augmenting the verb vector with thevector of the argument(s), with the goal of providing a betterdistributional representation for the verb itself.

We base our proposal on the theoretical assumption thatargument structure is part of the meaning of the verband not external to it (Pustejovsky, 1995, Van Valin, 2005,Levin and Rappaport Hovav, 2005).

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

26/38

Similarity task

We test our augmented vectors on a similarity task using theSimVerb-3500 dataset (Gerz et al 2016), designed to representthe complexity of verb meanings and to gain a betterunderstanding of verb semantics in distributional models.

The dataset contains 3500 verb pairs (827 distinct verbs) withat least 10 human ratings for the similarity for each pair(0-10).

respond/reply 9.79 vs. run/hit 0.17.

Annotated relations: antonyms, synonyms, hyper/hyponyms,no relation.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

26/38

Similarity task

We test our augmented vectors on a similarity task using theSimVerb-3500 dataset (Gerz et al 2016), designed to representthe complexity of verb meanings and to gain a betterunderstanding of verb semantics in distributional models.

The dataset contains 3500 verb pairs (827 distinct verbs) withat least 10 human ratings for the similarity for each pair(0-10).

respond/reply 9.79 vs. run/hit 0.17.

Annotated relations: antonyms, synonyms, hyper/hyponyms,no relation.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

26/38

Similarity task

We test our augmented vectors on a similarity task using theSimVerb-3500 dataset (Gerz et al 2016), designed to representthe complexity of verb meanings and to gain a betterunderstanding of verb semantics in distributional models.

The dataset contains 3500 verb pairs (827 distinct verbs) withat least 10 human ratings for the similarity for each pair(0-10).

respond/reply 9.79 vs. run/hit 0.17.

Annotated relations: antonyms, synonyms, hyper/hyponyms,no relation.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

26/38

Similarity task

We test our augmented vectors on a similarity task using theSimVerb-3500 dataset (Gerz et al 2016), designed to representthe complexity of verb meanings and to gain a betterunderstanding of verb semantics in distributional models.

The dataset contains 3500 verb pairs (827 distinct verbs) withat least 10 human ratings for the similarity for each pair(0-10).

respond/reply 9.79 vs. run/hit 0.17.

Annotated relations: antonyms, synonyms, hyper/hyponyms,no relation.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

27/38

SimVerb-3500 (excerpts)

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

28/38

Similarity task

We analyse the dataset in three different ways based on thenumber of the

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

29/38

Models

We use four different argument combination models toaugment the verb vectors in two different conjunctive anddisjunctive ways.

Disjunctive Operations: summation, point-wise maximum.Conjunctive Operations: point-wise multiplication,point-wise minimum, and Kronecker tensor product.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

29/38

Models

We use four different argument combination models toaugment the verb vectors in two different conjunctive anddisjunctive ways.

Disjunctive Operations: summation, point-wise maximum.

Conjunctive Operations: point-wise multiplication,point-wise minimum, and Kronecker tensor product.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

29/38

Models

We use four different argument combination models toaugment the verb vectors in two different conjunctive anddisjunctive ways.

Disjunctive Operations: summation, point-wise maximum.Conjunctive Operations: point-wise multiplication,point-wise minimum, and Kronecker tensor product.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

30/38

Vector spaces

The resulting representations are evaluated on the verbsimilarity task in three different vector spaces, trained on theparsed version of the UKWacky corpus.

Tensor Flow Skip-gram

Word2vec CBOWCount -based model (PPMI-normalized co-occurrencecount).

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

30/38

Vector spaces

The resulting representations are evaluated on the verbsimilarity task in three different vector spaces, trained on theparsed version of the UKWacky corpus.

Tensor Flow Skip-gramWord2vec CBOW

Count -based model (PPMI-normalized co-occurrencecount).

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

30/38

Vector spaces

The resulting representations are evaluated on the verbsimilarity task in three different vector spaces, trained on theparsed version of the UKWacky corpus.

Tensor Flow Skip-gramWord2vec CBOWCount -based model (PPMI-normalized co-occurrencecount).

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

31/38

Subj-Obj combination Formulae

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

32/38

Subject combination Formulae

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

33/38

500 Development Set

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

34/38

3000 Test Set

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

35/38

Results

In line with Pado and Erk 2008, we show that argumentaugmented models perform better than only verb basemodels in similarity task.

Specifically, the conjunctive model based on point-wisemultiplication and the Kronecker tensor product performsbetter than the base line of verb-only vectors and the otheroperations.

The best model is the count-based model.

The best optimization of the dataset is the subset with thetop 5 percent of the number of Subject/Objectsremoved.

Blundell, Sadrzadeh, Jezek 2017 “Experimental results on Exploiting

Predicate-Argument Structure for Verb Similarity in Distributional

Semantics”, Clasp Papers in Computational Linguistics, Gothenburg).

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

35/38

Results

In line with Pado and Erk 2008, we show that argumentaugmented models perform better than only verb basemodels in similarity task.

Specifically, the conjunctive model based on point-wisemultiplication and the Kronecker tensor product performsbetter than the base line of verb-only vectors and the otheroperations.

The best model is the count-based model.

The best optimization of the dataset is the subset with thetop 5 percent of the number of Subject/Objectsremoved.

Blundell, Sadrzadeh, Jezek 2017 “Experimental results on Exploiting

Predicate-Argument Structure for Verb Similarity in Distributional

Semantics”, Clasp Papers in Computational Linguistics, Gothenburg).

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

35/38

Results

In line with Pado and Erk 2008, we show that argumentaugmented models perform better than only verb basemodels in similarity task.

Specifically, the conjunctive model based on point-wisemultiplication and the Kronecker tensor product performsbetter than the base line of verb-only vectors and the otheroperations.

The best model is the count-based model.

The best optimization of the dataset is the subset with thetop 5 percent of the number of Subject/Objectsremoved.

Blundell, Sadrzadeh, Jezek 2017 “Experimental results on Exploiting

Predicate-Argument Structure for Verb Similarity in Distributional

Semantics”, Clasp Papers in Computational Linguistics, Gothenburg).

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

35/38

Results

In line with Pado and Erk 2008, we show that argumentaugmented models perform better than only verb basemodels in similarity task.

Specifically, the conjunctive model based on point-wisemultiplication and the Kronecker tensor product performsbetter than the base line of verb-only vectors and the otheroperations.

The best model is the count-based model.

The best optimization of the dataset is the subset with thetop 5 percent of the number of Subject/Objectsremoved.

Blundell, Sadrzadeh, Jezek 2017 “Experimental results on Exploiting

Predicate-Argument Structure for Verb Similarity in Distributional

Semantics”, Clasp Papers in Computational Linguistics, Gothenburg).

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

35/38

Results

In line with Pado and Erk 2008, we show that argumentaugmented models perform better than only verb basemodels in similarity task.

Specifically, the conjunctive model based on point-wisemultiplication and the Kronecker tensor product performsbetter than the base line of verb-only vectors and the otheroperations.

The best model is the count-based model.

The best optimization of the dataset is the subset with thetop 5 percent of the number of Subject/Objectsremoved.

Blundell, Sadrzadeh, Jezek 2017 “Experimental results on Exploiting

Predicate-Argument Structure for Verb Similarity in Distributional

Semantics”, Clasp Papers in Computational Linguistics, Gothenburg).

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

36/38

Concluding observations and lines of research

A plea for moderate minimalism in lexical semantics.

Enriched and dynamic model of composition (beyond sum) informal semantics, incorporating gradience and constraints insemantic phenomena.

Probabilistic approach to identify the degree of stability ofmeaning components in the lexicon.

Distributional methodology and geometric representations togain a broader understanding of the structure of our mentallexicon.

Language properties such as semantic context-sensitivity arenot solved in formal semantics: merging formal, distributionaland probabilistic approaches represents a multi-side benefit.

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

37/38

THANK YOU FOR LISTENING!

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

38/38

Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction

Recommended