Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
1/38
The Lexicon: An Interdisciplinary Introduction
Elisabetta JezekUniversity of Pavia
August, 6-10, 2018ESSLLI 2018
St. Kliment Ohridski UniversitySofia, Bulgaria
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
2/38
Course Outline
Lecture 1. Basics on lexicon, word types, and concept-wordmapping.
Lecture 2. The global structure of the Lexicon: word classesand word relations.
Lecture 3. Varieties of linguistic evidence in favour ofcontext-sensitive models of lexical meaning.
Lecture 4. Lexical information and its interplay with cognitionand pragmatic inference.
Lecture 5. The meaning of verbs and its representation incompositional vector space models.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
3/38
The distributional semantics framework
The distributional hypothesis is an hypothesis on the nature oflinguistic meaning.
It is based on the methodology of linguistic analysis developedin the 1950s by the American linguist Z. Harris (Harris 1956)and in parallel work carried out in British lexicology by J.R.Firth (Firth 1957).
It maintains that the meaning of a word correlates with itsdistribution, i.e. the set of contexts in which it occurs.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
3/38
The distributional semantics framework
The distributional hypothesis is an hypothesis on the nature oflinguistic meaning.
It is based on the methodology of linguistic analysis developedin the 1950s by the American linguist Z. Harris (Harris 1956)and in parallel work carried out in British lexicology by J.R.Firth (Firth 1957).
It maintains that the meaning of a word correlates with itsdistribution, i.e. the set of contexts in which it occurs.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
3/38
The distributional semantics framework
The distributional hypothesis is an hypothesis on the nature oflinguistic meaning.
It is based on the methodology of linguistic analysis developedin the 1950s by the American linguist Z. Harris (Harris 1956)and in parallel work carried out in British lexicology by J.R.Firth (Firth 1957).
It maintains that the meaning of a word correlates with itsdistribution, i.e. the set of contexts in which it occurs.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
4/38
The distributional semantic framework
The hypothesis predicts that it is possible to pin down themeaning of words in a relational fashion (that is, themeaning of one word with respect to the meaning of oneor more other words) by comparing the set of contexts inwhich these words occur.
Words that are observed to have similar co-occurrence profilesare likely to be semantically related in some ways (Harris,1954; Sahlgren, 2008).
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
4/38
The distributional semantic framework
The hypothesis predicts that it is possible to pin down themeaning of words in a relational fashion (that is, themeaning of one word with respect to the meaning of oneor more other words) by comparing the set of contexts inwhich these words occur.
Words that are observed to have similar co-occurrence profilesare likely to be semantically related in some ways (Harris,1954; Sahlgren, 2008).
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
5/38
Building Distributional Semantic Models (DSMs)
The distributional hypothesis is used as the guiding principleto build computational models of meaning representation,called the distributional semantic models (word-spacemodels in Sahlgren 2006) based on large-scale corpora.
Researchers use different techniques based on thedistributional analysis to translate the contextual behavior ofwords into representations of their meaning.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
5/38
Building Distributional Semantic Models (DSMs)
The distributional hypothesis is used as the guiding principleto build computational models of meaning representation,called the distributional semantic models (word-spacemodels in Sahlgren 2006) based on large-scale corpora.
Researchers use different techniques based on thedistributional analysis to translate the contextual behavior ofwords into representations of their meaning.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
6/38
Geometric metaphor of meaning
DSM are based on a complex metaphor of meaning.
Meanings are locations in a semantic space, and semanticsimilarity is proximity between the locations (Sahlgren 2006,19).
The key notion to go from distributional information to ageometric representation of meaning is the notion of vector.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
6/38
Geometric metaphor of meaning
DSM are based on a complex metaphor of meaning.
Meanings are locations in a semantic space, and semanticsimilarity is proximity between the locations (Sahlgren 2006,19).
The key notion to go from distributional information to ageometric representation of meaning is the notion of vector.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
6/38
Geometric metaphor of meaning
DSM are based on a complex metaphor of meaning.
Meanings are locations in a semantic space, and semanticsimilarity is proximity between the locations (Sahlgren 2006,19).
The key notion to go from distributional information to ageometric representation of meaning is the notion of vector.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
7/38
Context as Co-occurrence Matrixes
The sum of a word’s contexts is tabulated in a matrix ofco-occurrence and thereby transformed into a vector with ndimensions.
Each dimension records the number of times a word occurs ina given context (where context stands for another word, butcan also be a region of text or a whole document).
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
7/38
Context as Co-occurrence Matrixes
The sum of a word’s contexts is tabulated in a matrix ofco-occurrence and thereby transformed into a vector with ndimensions.
Each dimension records the number of times a word occurs ina given context (where context stands for another word, butcan also be a region of text or a whole document).
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
8/38
Context as Co-occurrence Matrixes
In the Figure below, dog, cat, and car are target words andruns and legs the context.
Context (or distributional) vectors are defined as the rows ofthe matrix. The co- occurrence-count list for the target worddog is (1; 4); that for cat is (1; 5) etc.
Such ordered lists of numbers are vectors.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
8/38
Context as Co-occurrence Matrixes
In the Figure below, dog, cat, and car are target words andruns and legs the context.
Context (or distributional) vectors are defined as the rows ofthe matrix. The co- occurrence-count list for the target worddog is (1; 4); that for cat is (1; 5) etc.
Such ordered lists of numbers are vectors.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
9/38
Semantic space and Vectors
Example of a 2-dimensional vector space representation withthree vectors v1 = (1; 5); v2 = (1; 4), and v3 = (4; 0).
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
9/38
Semantic space and Vectors
Example of a 2-dimensional vector space representation withthree vectors v1 = (1; 5); v2 = (1; 4), and v3 = (4; 0).
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
10/38
Semantic similarity as proximity between vectors
Proximity between word vectors - that represent distribution -is taken as an index of meaning similarity.
Vector cosine is generally adopted to measure such proximity,even though other measures have been proposed (Weeds et al2004).
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
10/38
Semantic similarity as proximity between vectors
Proximity between word vectors - that represent distribution -is taken as an index of meaning similarity.
Vector cosine is generally adopted to measure such proximity,even though other measures have been proposed (Weeds et al2004).
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
10/38
Semantic similarity as proximity between vectors
Proximity between word vectors - that represent distribution -is taken as an index of meaning similarity.
Vector cosine is generally adopted to measure such proximity,even though other measures have been proposed (Weeds et al2004).
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
11/38
Semantic relatedness
The relationship most typically modelled is general semanticrelatedness, as opposed to more precise indications of, forinstance, similarity (Hill et al., 2015).
Distributional semantic models have been effectively appliedto tasks ranging from language modelling (Bengio, 2009) tometaphor classification (Gutirrez et al., 2016) and theextrapolation of more fine-grained correspondences betweenconcepts (Derrac and Schockaert, 2015).
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
11/38
Semantic relatedness
The relationship most typically modelled is general semanticrelatedness, as opposed to more precise indications of, forinstance, similarity (Hill et al., 2015).
Distributional semantic models have been effectively appliedto tasks ranging from language modelling (Bengio, 2009) tometaphor classification (Gutirrez et al., 2016) and theextrapolation of more fine-grained correspondences betweenconcepts (Derrac and Schockaert, 2015).
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
12/38
The meaning of verbs and their representation in avector-based model of compositionality
Three components in verb’s denotation, which togetherconstitute different aspect of the same object, i.e. aneventuality.
Time and time structure.
Argumenthood.
Inherent Meaning.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
12/38
The meaning of verbs and their representation in avector-based model of compositionality
Three components in verb’s denotation, which togetherconstitute different aspect of the same object, i.e. aneventuality.
Time and time structure.
Argumenthood.
Inherent Meaning.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
12/38
The meaning of verbs and their representation in avector-based model of compositionality
Three components in verb’s denotation, which togetherconstitute different aspect of the same object, i.e. aneventuality.
Time and time structure.
Argumenthood.
Inherent Meaning.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
12/38
The meaning of verbs and their representation in avector-based model of compositionality
Three components in verb’s denotation, which togetherconstitute different aspect of the same object, i.e. aneventuality.
Time and time structure.
Argumenthood.
Inherent Meaning.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
13/38
Time and time structure
The denotation of a verb is an eventuality, and an eventualityis located and structured in time.
states (to own)
processes (to work, to sleep)
punctual events (to find, to arrive)
degree achievements (to ripen), incremental theme verbs (tofill), multi-scalar verbs (to increase)
semelfactives or points (to cough, to knock)
The Vendler-Dowty taxonomy, Smith 1991.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
13/38
Time and time structure
The denotation of a verb is an eventuality, and an eventualityis located and structured in time.
states (to own)
processes (to work, to sleep)
punctual events (to find, to arrive)
degree achievements (to ripen), incremental theme verbs (tofill), multi-scalar verbs (to increase)
semelfactives or points (to cough, to knock)
The Vendler-Dowty taxonomy, Smith 1991.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
13/38
Time and time structure
The denotation of a verb is an eventuality, and an eventualityis located and structured in time.
states (to own)
processes (to work, to sleep)
punctual events (to find, to arrive)
degree achievements (to ripen), incremental theme verbs (tofill), multi-scalar verbs (to increase)
semelfactives or points (to cough, to knock)
The Vendler-Dowty taxonomy, Smith 1991.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
13/38
Time and time structure
The denotation of a verb is an eventuality, and an eventualityis located and structured in time.
states (to own)
processes (to work, to sleep)
punctual events (to find, to arrive)
degree achievements (to ripen), incremental theme verbs (tofill), multi-scalar verbs (to increase)
semelfactives or points (to cough, to knock)
The Vendler-Dowty taxonomy, Smith 1991.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
13/38
Time and time structure
The denotation of a verb is an eventuality, and an eventualityis located and structured in time.
states (to own)
processes (to work, to sleep)
punctual events (to find, to arrive)
degree achievements (to ripen), incremental theme verbs (tofill), multi-scalar verbs (to increase)
semelfactives or points (to cough, to knock)
The Vendler-Dowty taxonomy, Smith 1991.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
13/38
Time and time structure
The denotation of a verb is an eventuality, and an eventualityis located and structured in time.
states (to own)
processes (to work, to sleep)
punctual events (to find, to arrive)
degree achievements (to ripen), incremental theme verbs (tofill), multi-scalar verbs (to increase)
semelfactives or points (to cough, to knock)
The Vendler-Dowty taxonomy, Smith 1991.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
14/38
Event Structure
Events may be complex, i.e. they may be include subevents(Parsons 1990, Pustejovsky 1991).
Causal, temporal relations between event parts (subevents).
Kill includes an action (cause die) which precedes a necessaryeffect (be dead).
Show includes an action (make visible) which is followed by alikely effect (see).
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
14/38
Event Structure
Events may be complex, i.e. they may be include subevents(Parsons 1990, Pustejovsky 1991).
Causal, temporal relations between event parts (subevents).
Kill includes an action (cause die) which precedes a necessaryeffect (be dead).
Show includes an action (make visible) which is followed by alikely effect (see).
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
14/38
Event Structure
Events may be complex, i.e. they may be include subevents(Parsons 1990, Pustejovsky 1991).
Causal, temporal relations between event parts (subevents).
Kill includes an action (cause die) which precedes a necessaryeffect (be dead).
Show includes an action (make visible) which is followed by alikely effect (see).
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
14/38
Event Structure
Events may be complex, i.e. they may be include subevents(Parsons 1990, Pustejovsky 1991).
Causal, temporal relations between event parts (subevents).
Kill includes an action (cause die) which precedes a necessaryeffect (be dead).
Show includes an action (make visible) which is followed by alikely effect (see).
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
14/38
Event Structure
Events may be complex, i.e. they may be include subevents(Parsons 1990, Pustejovsky 1991).
Causal, temporal relations between event parts (subevents).
Kill includes an action (cause die) which precedes a necessaryeffect (be dead).
Show includes an action (make visible) which is followed by alikely effect (see).
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
15/38
Event Structure for kill and show
(1) kill.v
e
<������
HHHHHH
e2
The animals are dead
e1
The fire killed the animals
(2) break.v
e
<������
HHHH
HHe2
The film is seen
e1
The festival showed the filmThe film is shown
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
16/38
Argumenthood
Argumenthood is the property of requiring (a certain numberand type of) slots to express the grammatically relevantparticipants in the event.
?no argument, only predication (to snow, to rain)
one argument (to run, to swim)
two arguments (to know, to participate)
three arguments (to put, to give)
Levin Rappaport 2005, Argument Realization.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
16/38
Argumenthood
Argumenthood is the property of requiring (a certain numberand type of) slots to express the grammatically relevantparticipants in the event.
?no argument, only predication (to snow, to rain)
one argument (to run, to swim)
two arguments (to know, to participate)
three arguments (to put, to give)
Levin Rappaport 2005, Argument Realization.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
16/38
Argumenthood
Argumenthood is the property of requiring (a certain numberand type of) slots to express the grammatically relevantparticipants in the event.
?no argument, only predication (to snow, to rain)
one argument (to run, to swim)
two arguments (to know, to participate)
three arguments (to put, to give)
Levin Rappaport 2005, Argument Realization.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
16/38
Argumenthood
Argumenthood is the property of requiring (a certain numberand type of) slots to express the grammatically relevantparticipants in the event.
?no argument, only predication (to snow, to rain)
one argument (to run, to swim)
two arguments (to know, to participate)
three arguments (to put, to give)
Levin Rappaport 2005, Argument Realization.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
16/38
Argumenthood
Argumenthood is the property of requiring (a certain numberand type of) slots to express the grammatically relevantparticipants in the event.
?no argument, only predication (to snow, to rain)
one argument (to run, to swim)
two arguments (to know, to participate)
three arguments (to put, to give)
Levin Rappaport 2005, Argument Realization.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
16/38
Argumenthood
Argumenthood is the property of requiring (a certain numberand type of) slots to express the grammatically relevantparticipants in the event.
?no argument, only predication (to snow, to rain)
one argument (to run, to swim)
two arguments (to know, to participate)
three arguments (to put, to give)
Levin Rappaport 2005, Argument Realization.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
17/38
Argumenthood
Arguments have properties.
Thematic (or semantic) role.
Semantic Type.
Syntactic Realization.
Levin Rappaport 2005, Argument Realization.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
17/38
Argumenthood
Arguments have properties.
Thematic (or semantic) role.
Semantic Type.
Syntactic Realization.
Levin Rappaport 2005, Argument Realization.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
17/38
Argumenthood
Arguments have properties.
Thematic (or semantic) role.
Semantic Type.
Syntactic Realization.
Levin Rappaport 2005, Argument Realization.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
17/38
Argumenthood
Arguments have properties.
Thematic (or semantic) role.
Semantic Type.
Syntactic Realization.
Levin Rappaport 2005, Argument Realization.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
17/38
Argumenthood
Arguments have properties.
Thematic (or semantic) role.
Semantic Type.
Syntactic Realization.
Levin Rappaport 2005, Argument Realization.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
18/38
Thematic (or semantic) rolesJezek 2016 The Lexicon
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
19/38
Dynamic Argument Structure for buildJezek and Pustejovsky 2017, Dynamic Argument Structure
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
build .v
argstr =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
d-arg:resource = exist(material)h-arg1: resource = mod(material)h-arg2: result = init(artifact)t-arg:result = exist(artifact)
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
20/38
Event Structure with Dynamic Argument Annotation for tobuildJezek and Pustejovsky 2017, Dynamic Argument Structure
(3) build .vHHHHH
�����
e1 -init( result ∶ artifact)
e2HHHHH
�����
e11
exist(resource ∶ material)
mod( resource ∶ material)e1k- exist(result ∶ artifact)
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
21/38
Inherent Meaning
Events are complex arrays of semantic properties, and verbsand verb classes encode them capitalising on specific ones.
motion verbs (go, walk, climb, ?sit)
manner verbs (wipe, scrub)
perception verbs (see, smell, hear, llisten)
verbs of cognition (understand, grasp)
verbs of communication (talk, tell, whisper)
verbs expressing measures (cost, weigh)
Event Ontologies.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
21/38
Inherent Meaning
Events are complex arrays of semantic properties, and verbsand verb classes encode them capitalising on specific ones.
motion verbs (go, walk, climb, ?sit)
manner verbs (wipe, scrub)
perception verbs (see, smell, hear, llisten)
verbs of cognition (understand, grasp)
verbs of communication (talk, tell, whisper)
verbs expressing measures (cost, weigh)
Event Ontologies.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
21/38
Inherent Meaning
Events are complex arrays of semantic properties, and verbsand verb classes encode them capitalising on specific ones.
motion verbs (go, walk, climb, ?sit)
manner verbs (wipe, scrub)
perception verbs (see, smell, hear, llisten)
verbs of cognition (understand, grasp)
verbs of communication (talk, tell, whisper)
verbs expressing measures (cost, weigh)
Event Ontologies.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
21/38
Inherent Meaning
Events are complex arrays of semantic properties, and verbsand verb classes encode them capitalising on specific ones.
motion verbs (go, walk, climb, ?sit)
manner verbs (wipe, scrub)
perception verbs (see, smell, hear, llisten)
verbs of cognition (understand, grasp)
verbs of communication (talk, tell, whisper)
verbs expressing measures (cost, weigh)
Event Ontologies.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
21/38
Inherent Meaning
Events are complex arrays of semantic properties, and verbsand verb classes encode them capitalising on specific ones.
motion verbs (go, walk, climb, ?sit)
manner verbs (wipe, scrub)
perception verbs (see, smell, hear, llisten)
verbs of cognition (understand, grasp)
verbs of communication (talk, tell, whisper)
verbs expressing measures (cost, weigh)
Event Ontologies.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
21/38
Inherent Meaning
Events are complex arrays of semantic properties, and verbsand verb classes encode them capitalising on specific ones.
motion verbs (go, walk, climb, ?sit)
manner verbs (wipe, scrub)
perception verbs (see, smell, hear, llisten)
verbs of cognition (understand, grasp)
verbs of communication (talk, tell, whisper)
verbs expressing measures (cost, weigh)
Event Ontologies.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
21/38
Inherent Meaning
Events are complex arrays of semantic properties, and verbsand verb classes encode them capitalising on specific ones.
motion verbs (go, walk, climb, ?sit)
manner verbs (wipe, scrub)
perception verbs (see, smell, hear, llisten)
verbs of cognition (understand, grasp)
verbs of communication (talk, tell, whisper)
verbs expressing measures (cost, weigh)
Event Ontologies.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
21/38
Inherent Meaning
Events are complex arrays of semantic properties, and verbsand verb classes encode them capitalising on specific ones.
motion verbs (go, walk, climb, ?sit)
manner verbs (wipe, scrub)
perception verbs (see, smell, hear, llisten)
verbs of cognition (understand, grasp)
verbs of communication (talk, tell, whisper)
verbs expressing measures (cost, weigh)
Event Ontologies.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
22/38
Co-composition
Verb meaning is co-dependent on those of its arguments.
take a tablet ∣ a train.
like pizza ∣ my sister.
open the door ∣ a letter ∣ a restaurant ∣ a bank account ∣ adebate.
Formal mechanism of co-composition in Pustejovsky 1995.
Cf. “intersective method of combination is well-known to fail in many
cases”, Baroni and Zamparelli 2010.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
22/38
Co-composition
Verb meaning is co-dependent on those of its arguments.
take a tablet ∣ a train.
like pizza ∣ my sister.
open the door ∣ a letter ∣ a restaurant ∣ a bank account ∣ adebate.
Formal mechanism of co-composition in Pustejovsky 1995.
Cf. “intersective method of combination is well-known to fail in many
cases”, Baroni and Zamparelli 2010.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
22/38
Co-composition
Verb meaning is co-dependent on those of its arguments.
take a tablet ∣ a train.
like pizza ∣ my sister.
open the door ∣ a letter ∣ a restaurant ∣ a bank account ∣ adebate.
Formal mechanism of co-composition in Pustejovsky 1995.
Cf. “intersective method of combination is well-known to fail in many
cases”, Baroni and Zamparelli 2010.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
22/38
Co-composition
Verb meaning is co-dependent on those of its arguments.
take a tablet ∣ a train.
like pizza ∣ my sister.
open the door ∣ a letter ∣ a restaurant ∣ a bank account ∣ adebate.
Formal mechanism of co-composition in Pustejovsky 1995.
Cf. “intersective method of combination is well-known to fail in many
cases”, Baroni and Zamparelli 2010.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
22/38
Co-composition
Verb meaning is co-dependent on those of its arguments.
take a tablet ∣ a train.
like pizza ∣ my sister.
open the door ∣ a letter ∣ a restaurant ∣ a bank account ∣ adebate.
Formal mechanism of co-composition in Pustejovsky 1995.
Cf. “intersective method of combination is well-known to fail in many
cases”, Baroni and Zamparelli 2010.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
22/38
Co-composition
Verb meaning is co-dependent on those of its arguments.
take a tablet ∣ a train.
like pizza ∣ my sister.
open the door ∣ a letter ∣ a restaurant ∣ a bank account ∣ adebate.
Formal mechanism of co-composition in Pustejovsky 1995.
Cf. “intersective method of combination is well-known to fail in many
cases”, Baroni and Zamparelli 2010.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
23/38
Verbs as tensors
Function application captures one aspect of a verb semantics,i.e. its relational aspect.
The verb introduces a whole event.
How do we capture an event in vector-space models?
Corpus-harvested vectors encoding aspectual/temporalproperties?
Build vectors/conceptual spaces representative of the typerestrictions verb place on their arguments?
Ponti, Jezek and Magnini 2016. Grounding the lexical sets of
causative/inchoative verbs with word embedding. DSALT at ESSLLI
2016.
Cf. Grefenstette and Sadrzadeh 2015, McGregor, Purver and Wiggins
2015.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
23/38
Verbs as tensors
Function application captures one aspect of a verb semantics,i.e. its relational aspect.
The verb introduces a whole event.
How do we capture an event in vector-space models?
Corpus-harvested vectors encoding aspectual/temporalproperties?
Build vectors/conceptual spaces representative of the typerestrictions verb place on their arguments?
Ponti, Jezek and Magnini 2016. Grounding the lexical sets of
causative/inchoative verbs with word embedding. DSALT at ESSLLI
2016.
Cf. Grefenstette and Sadrzadeh 2015, McGregor, Purver and Wiggins
2015.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
23/38
Verbs as tensors
Function application captures one aspect of a verb semantics,i.e. its relational aspect.
The verb introduces a whole event.
How do we capture an event in vector-space models?
Corpus-harvested vectors encoding aspectual/temporalproperties?
Build vectors/conceptual spaces representative of the typerestrictions verb place on their arguments?
Ponti, Jezek and Magnini 2016. Grounding the lexical sets of
causative/inchoative verbs with word embedding. DSALT at ESSLLI
2016.
Cf. Grefenstette and Sadrzadeh 2015, McGregor, Purver and Wiggins
2015.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
23/38
Verbs as tensors
Function application captures one aspect of a verb semantics,i.e. its relational aspect.
The verb introduces a whole event.
How do we capture an event in vector-space models?
Corpus-harvested vectors encoding aspectual/temporalproperties?
Build vectors/conceptual spaces representative of the typerestrictions verb place on their arguments?
Ponti, Jezek and Magnini 2016. Grounding the lexical sets of
causative/inchoative verbs with word embedding. DSALT at ESSLLI
2016.
Cf. Grefenstette and Sadrzadeh 2015, McGregor, Purver and Wiggins
2015.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
23/38
Verbs as tensors
Function application captures one aspect of a verb semantics,i.e. its relational aspect.
The verb introduces a whole event.
How do we capture an event in vector-space models?
Corpus-harvested vectors encoding aspectual/temporalproperties?
Build vectors/conceptual spaces representative of the typerestrictions verb place on their arguments?
Ponti, Jezek and Magnini 2016. Grounding the lexical sets of
causative/inchoative verbs with word embedding. DSALT at ESSLLI
2016.
Cf. Grefenstette and Sadrzadeh 2015, McGregor, Purver and Wiggins
2015.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
23/38
Verbs as tensors
Function application captures one aspect of a verb semantics,i.e. its relational aspect.
The verb introduces a whole event.
How do we capture an event in vector-space models?
Corpus-harvested vectors encoding aspectual/temporalproperties?
Build vectors/conceptual spaces representative of the typerestrictions verb place on their arguments?
Ponti, Jezek and Magnini 2016. Grounding the lexical sets of
causative/inchoative verbs with word embedding. DSALT at ESSLLI
2016.
Cf. Grefenstette and Sadrzadeh 2015, McGregor, Purver and Wiggins
2015.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
23/38
Verbs as tensors
Function application captures one aspect of a verb semantics,i.e. its relational aspect.
The verb introduces a whole event.
How do we capture an event in vector-space models?
Corpus-harvested vectors encoding aspectual/temporalproperties?
Build vectors/conceptual spaces representative of the typerestrictions verb place on their arguments?
Ponti, Jezek and Magnini 2016. Grounding the lexical sets of
causative/inchoative verbs with word embedding. DSALT at ESSLLI
2016.
Cf. Grefenstette and Sadrzadeh 2015, McGregor, Purver and Wiggins
2015.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
24/38
Building Verb Vectors
In distributional semantics, verbs have sometimes beenrepresented through vectors built on verbs themselves or onthe sum of the arguments vectors (see for the latterGrefenstette and Sadrzadeh, 2011; Kartsaklis and Sadrzadeh,2013).
In Chersoni et al 2016 syntactic joint contexts are used,abstracting from linear word windows by using dependencies.
Assuming a word window of 3 and using dependencies, a verblike to love is then represented by a collection of contextssuch as:
“the new students the school campus”.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
24/38
Building Verb Vectors
In distributional semantics, verbs have sometimes beenrepresented through vectors built on verbs themselves or onthe sum of the arguments vectors (see for the latterGrefenstette and Sadrzadeh, 2011; Kartsaklis and Sadrzadeh,2013).
In Chersoni et al 2016 syntactic joint contexts are used,abstracting from linear word windows by using dependencies.
Assuming a word window of 3 and using dependencies, a verblike to love is then represented by a collection of contextssuch as:
“the new students the school campus”.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
24/38
Building Verb Vectors
In distributional semantics, verbs have sometimes beenrepresented through vectors built on verbs themselves or onthe sum of the arguments vectors (see for the latterGrefenstette and Sadrzadeh, 2011; Kartsaklis and Sadrzadeh,2013).
In Chersoni et al 2016 syntactic joint contexts are used,abstracting from linear word windows by using dependencies.
Assuming a word window of 3 and using dependencies, a verblike to love is then represented by a collection of contextssuch as:
“the new students the school campus”.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
24/38
Building Verb Vectors
In distributional semantics, verbs have sometimes beenrepresented through vectors built on verbs themselves or onthe sum of the arguments vectors (see for the latterGrefenstette and Sadrzadeh, 2011; Kartsaklis and Sadrzadeh,2013).
In Chersoni et al 2016 syntactic joint contexts are used,abstracting from linear word windows by using dependencies.
Assuming a word window of 3 and using dependencies, a verblike to love is then represented by a collection of contextssuch as:
“the new students the school campus”.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
25/38
Augmented Verb Vectors
In Blundell, Sadrzadeh and Jezek 2017, we propose to buildvectors for verbs by augmenting the verb vector with thevector of the argument(s), with the goal of providing a betterdistributional representation for the verb itself.
We base our proposal on the theoretical assumption thatargument structure is part of the meaning of the verband not external to it (Pustejovsky, 1995, Van Valin, 2005,Levin and Rappaport Hovav, 2005).
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
25/38
Augmented Verb Vectors
In Blundell, Sadrzadeh and Jezek 2017, we propose to buildvectors for verbs by augmenting the verb vector with thevector of the argument(s), with the goal of providing a betterdistributional representation for the verb itself.
We base our proposal on the theoretical assumption thatargument structure is part of the meaning of the verband not external to it (Pustejovsky, 1995, Van Valin, 2005,Levin and Rappaport Hovav, 2005).
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
26/38
Similarity task
We test our augmented vectors on a similarity task using theSimVerb-3500 dataset (Gerz et al 2016), designed to representthe complexity of verb meanings and to gain a betterunderstanding of verb semantics in distributional models.
The dataset contains 3500 verb pairs (827 distinct verbs) withat least 10 human ratings for the similarity for each pair(0-10).
respond/reply 9.79 vs. run/hit 0.17.
Annotated relations: antonyms, synonyms, hyper/hyponyms,no relation.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
26/38
Similarity task
We test our augmented vectors on a similarity task using theSimVerb-3500 dataset (Gerz et al 2016), designed to representthe complexity of verb meanings and to gain a betterunderstanding of verb semantics in distributional models.
The dataset contains 3500 verb pairs (827 distinct verbs) withat least 10 human ratings for the similarity for each pair(0-10).
respond/reply 9.79 vs. run/hit 0.17.
Annotated relations: antonyms, synonyms, hyper/hyponyms,no relation.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
26/38
Similarity task
We test our augmented vectors on a similarity task using theSimVerb-3500 dataset (Gerz et al 2016), designed to representthe complexity of verb meanings and to gain a betterunderstanding of verb semantics in distributional models.
The dataset contains 3500 verb pairs (827 distinct verbs) withat least 10 human ratings for the similarity for each pair(0-10).
respond/reply 9.79 vs. run/hit 0.17.
Annotated relations: antonyms, synonyms, hyper/hyponyms,no relation.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
26/38
Similarity task
We test our augmented vectors on a similarity task using theSimVerb-3500 dataset (Gerz et al 2016), designed to representthe complexity of verb meanings and to gain a betterunderstanding of verb semantics in distributional models.
The dataset contains 3500 verb pairs (827 distinct verbs) withat least 10 human ratings for the similarity for each pair(0-10).
respond/reply 9.79 vs. run/hit 0.17.
Annotated relations: antonyms, synonyms, hyper/hyponyms,no relation.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
27/38
SimVerb-3500 (excerpts)
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
28/38
Similarity task
We analyse the dataset in three different ways based on thenumber of the
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
29/38
Models
We use four different argument combination models toaugment the verb vectors in two different conjunctive anddisjunctive ways.
Disjunctive Operations: summation, point-wise maximum.Conjunctive Operations: point-wise multiplication,point-wise minimum, and Kronecker tensor product.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
29/38
Models
We use four different argument combination models toaugment the verb vectors in two different conjunctive anddisjunctive ways.
Disjunctive Operations: summation, point-wise maximum.
Conjunctive Operations: point-wise multiplication,point-wise minimum, and Kronecker tensor product.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
29/38
Models
We use four different argument combination models toaugment the verb vectors in two different conjunctive anddisjunctive ways.
Disjunctive Operations: summation, point-wise maximum.Conjunctive Operations: point-wise multiplication,point-wise minimum, and Kronecker tensor product.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
30/38
Vector spaces
The resulting representations are evaluated on the verbsimilarity task in three different vector spaces, trained on theparsed version of the UKWacky corpus.
Tensor Flow Skip-gram
Word2vec CBOWCount -based model (PPMI-normalized co-occurrencecount).
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
30/38
Vector spaces
The resulting representations are evaluated on the verbsimilarity task in three different vector spaces, trained on theparsed version of the UKWacky corpus.
Tensor Flow Skip-gramWord2vec CBOW
Count -based model (PPMI-normalized co-occurrencecount).
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
30/38
Vector spaces
The resulting representations are evaluated on the verbsimilarity task in three different vector spaces, trained on theparsed version of the UKWacky corpus.
Tensor Flow Skip-gramWord2vec CBOWCount -based model (PPMI-normalized co-occurrencecount).
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
31/38
Subj-Obj combination Formulae
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
32/38
Subject combination Formulae
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
33/38
500 Development Set
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
34/38
3000 Test Set
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
35/38
Results
In line with Pado and Erk 2008, we show that argumentaugmented models perform better than only verb basemodels in similarity task.
Specifically, the conjunctive model based on point-wisemultiplication and the Kronecker tensor product performsbetter than the base line of verb-only vectors and the otheroperations.
The best model is the count-based model.
The best optimization of the dataset is the subset with thetop 5 percent of the number of Subject/Objectsremoved.
Blundell, Sadrzadeh, Jezek 2017 “Experimental results on Exploiting
Predicate-Argument Structure for Verb Similarity in Distributional
Semantics”, Clasp Papers in Computational Linguistics, Gothenburg).
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
35/38
Results
In line with Pado and Erk 2008, we show that argumentaugmented models perform better than only verb basemodels in similarity task.
Specifically, the conjunctive model based on point-wisemultiplication and the Kronecker tensor product performsbetter than the base line of verb-only vectors and the otheroperations.
The best model is the count-based model.
The best optimization of the dataset is the subset with thetop 5 percent of the number of Subject/Objectsremoved.
Blundell, Sadrzadeh, Jezek 2017 “Experimental results on Exploiting
Predicate-Argument Structure for Verb Similarity in Distributional
Semantics”, Clasp Papers in Computational Linguistics, Gothenburg).
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
35/38
Results
In line with Pado and Erk 2008, we show that argumentaugmented models perform better than only verb basemodels in similarity task.
Specifically, the conjunctive model based on point-wisemultiplication and the Kronecker tensor product performsbetter than the base line of verb-only vectors and the otheroperations.
The best model is the count-based model.
The best optimization of the dataset is the subset with thetop 5 percent of the number of Subject/Objectsremoved.
Blundell, Sadrzadeh, Jezek 2017 “Experimental results on Exploiting
Predicate-Argument Structure for Verb Similarity in Distributional
Semantics”, Clasp Papers in Computational Linguistics, Gothenburg).
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
35/38
Results
In line with Pado and Erk 2008, we show that argumentaugmented models perform better than only verb basemodels in similarity task.
Specifically, the conjunctive model based on point-wisemultiplication and the Kronecker tensor product performsbetter than the base line of verb-only vectors and the otheroperations.
The best model is the count-based model.
The best optimization of the dataset is the subset with thetop 5 percent of the number of Subject/Objectsremoved.
Blundell, Sadrzadeh, Jezek 2017 “Experimental results on Exploiting
Predicate-Argument Structure for Verb Similarity in Distributional
Semantics”, Clasp Papers in Computational Linguistics, Gothenburg).
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
35/38
Results
In line with Pado and Erk 2008, we show that argumentaugmented models perform better than only verb basemodels in similarity task.
Specifically, the conjunctive model based on point-wisemultiplication and the Kronecker tensor product performsbetter than the base line of verb-only vectors and the otheroperations.
The best model is the count-based model.
The best optimization of the dataset is the subset with thetop 5 percent of the number of Subject/Objectsremoved.
Blundell, Sadrzadeh, Jezek 2017 “Experimental results on Exploiting
Predicate-Argument Structure for Verb Similarity in Distributional
Semantics”, Clasp Papers in Computational Linguistics, Gothenburg).
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
36/38
Concluding observations and lines of research
A plea for moderate minimalism in lexical semantics.
Enriched and dynamic model of composition (beyond sum) informal semantics, incorporating gradience and constraints insemantic phenomena.
Probabilistic approach to identify the degree of stability ofmeaning components in the lexicon.
Distributional methodology and geometric representations togain a broader understanding of the structure of our mentallexicon.
Language properties such as semantic context-sensitivity arenot solved in formal semantics: merging formal, distributionaland probabilistic approaches represents a multi-side benefit.
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
37/38
THANK YOU FOR LISTENING!
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction
38/38
Elisabetta Jezek The Lexicon: an Interdisciplinary Introduction