Learnability and Semantic Universals

Preview:

Citation preview

Semantic Universals The Learning Model Experiments Conclusion

Learnability and Semantic Universals

Shane Steinert-Threlkeld(joint work-in-progress with Jakub Szymanik)

Cognitive Semantics and Quantities Kick-off WorkshopSeptember 28, 2017

1 / 38

Semantic Universals The Learning Model Experiments Conclusion

Overview

1 Semantic Universals

2 The Learning Model

3 Experiments

4 Conclusion

2 / 38

Semantic Universals The Learning Model Experiments Conclusion

Universals in Linguistic Theory

Question

What is the range of variation in human languages? That is: which outof all of the logically possible languages that humans could speak, dothey in fact speak?

3 / 38

Semantic Universals The Learning Model Experiments Conclusion

Example Universals

Sound (phonology):All languages have consonants and vowels. Every language has atleast /a i u/.

Grammar (syntax):All languages have verbs and nouns.

Meaning (semantics):All languages have syntactic constituents (NPs) whose semanticfunction is to express generalized quantifiers. (Barwise and Cooper1981)

4 / 38

Semantic Universals The Learning Model Experiments Conclusion

Example Universals

Sound (phonology):All languages have consonants and vowels. Every language has atleast /a i u/.

Grammar (syntax):All languages have verbs and nouns.

Meaning (semantics):All languages have syntactic constituents (NPs) whose semanticfunction is to express generalized quantifiers. (Barwise and Cooper1981)

4 / 38

Semantic Universals The Learning Model Experiments Conclusion

Example Universals

Sound (phonology):All languages have consonants and vowels. Every language has atleast /a i u/.

Grammar (syntax):All languages have verbs and nouns.

Meaning (semantics):

All languages have syntactic constituents (NPs) whose semanticfunction is to express generalized quantifiers. (Barwise and Cooper1981)

4 / 38

Semantic Universals The Learning Model Experiments Conclusion

Example Universals

Sound (phonology):All languages have consonants and vowels. Every language has atleast /a i u/.

Grammar (syntax):All languages have verbs and nouns.

Meaning (semantics):All languages have syntactic constituents (NPs) whose semanticfunction is to express generalized quantifiers. (Barwise and Cooper1981)

4 / 38

Semantic Universals The Learning Model Experiments Conclusion

Determiners

Determiners:

Simple: every, some, few, most, five, . . .Complex: all but five, fewer than three, at least eight or fewer thanfive, . . .

Denote type 〈1, 1〉 generalized quantifiers: sets of models of theform 〈M,A,B〉 with A,B ⊆ M

For example:

JeveryK = {〈M,A,B〉 : A ⊆ B}JmostK = {〈M,A,B〉 : |A ∩ B| > |A \ B|}

5 / 38

Semantic Universals The Learning Model Experiments Conclusion

Determiners

Determiners:

Simple: every, some, few, most, five, . . .Complex: all but five, fewer than three, at least eight or fewer thanfive, . . .

Denote type 〈1, 1〉 generalized quantifiers: sets of models of theform 〈M,A,B〉 with A,B ⊆ M

For example:

JeveryK = {〈M,A,B〉 : A ⊆ B}JmostK = {〈M,A,B〉 : |A ∩ B| > |A \ B|}

5 / 38

Semantic Universals The Learning Model Experiments Conclusion

Determiners

Determiners:

Simple: every, some, few, most, five, . . .Complex: all but five, fewer than three, at least eight or fewer thanfive, . . .

Denote type 〈1, 1〉 generalized quantifiers: sets of models of theform 〈M,A,B〉 with A,B ⊆ M

For example:

JeveryK = {〈M,A,B〉 : A ⊆ B}JmostK = {〈M,A,B〉 : |A ∩ B| > |A \ B|}

5 / 38

Semantic Universals The Learning Model Experiments Conclusion

Monotonicity

Many French people smoke cigarettes⇒ Many French people smoke

Few French people smoke⇒ Few French people smoke cigarettes

Q is upward (resp. downward) monotone iff:if 〈M,A,B〉 ∈ Q and B ⊆ B ′ (resp. B ′ ⊆ B), then 〈M,A,B〉 ∈ Q

A determiner is monotone if it denotes either an upward ordownward monotone generalized quantifier

6 / 38

Semantic Universals The Learning Model Experiments Conclusion

Monotonicity

Many French people smoke cigarettes⇒ Many French people smoke

Few French people smoke⇒ Few French people smoke cigarettes

Q is upward (resp. downward) monotone iff:if 〈M,A,B〉 ∈ Q and B ⊆ B ′ (resp. B ′ ⊆ B), then 〈M,A,B〉 ∈ Q

A determiner is monotone if it denotes either an upward ordownward monotone generalized quantifier

6 / 38

Semantic Universals The Learning Model Experiments Conclusion

Monotonicity

Many French people smoke cigarettes⇒ Many French people smoke

Few French people smoke⇒ Few French people smoke cigarettes

Q is upward (resp. downward) monotone iff:if 〈M,A,B〉 ∈ Q and B ⊆ B ′ (resp. B ′ ⊆ B), then 〈M,A,B〉 ∈ Q

A determiner is monotone if it denotes either an upward ordownward monotone generalized quantifier

6 / 38

Semantic Universals The Learning Model Experiments Conclusion

Monotonicity Universal

Monotonicity Universal (Barwise and Cooper 1981)

All simple determiners are monotone.

7 / 38

Semantic Universals The Learning Model Experiments Conclusion

Quantity

Key intuition: determiners denote general relations between theirrestrictor and nuclear scope, not ones that depend on the identitiesof particular elements of the domain

Q is permutation-closed iff:if 〈M,A,B〉 ∈ Q and π is a permutation of M, thenπ(〈M,A,B〉) ∈ Q

Q is isomorphism-closed iff:if 〈M,A,B〉 ∈ Q and 〈M,A,B〉 ∼= 〈M ′,A′,B ′〉, then〈M ′,A′,B ′〉 ∈ Q

Q is quantitative:if 〈M,A,B〉 ∈ Q and A∩B,A \B,B \A,M \ (A∪B) have the samecardinality as their primed-counterparts, then 〈M ′,A′,B ′〉 ∈ Q

8 / 38

Semantic Universals The Learning Model Experiments Conclusion

Quantity

Key intuition: determiners denote general relations between theirrestrictor and nuclear scope, not ones that depend on the identitiesof particular elements of the domain

Q is permutation-closed iff:if 〈M,A,B〉 ∈ Q and π is a permutation of M, thenπ(〈M,A,B〉) ∈ Q

Q is isomorphism-closed iff:if 〈M,A,B〉 ∈ Q and 〈M,A,B〉 ∼= 〈M ′,A′,B ′〉, then〈M ′,A′,B ′〉 ∈ Q

Q is quantitative:if 〈M,A,B〉 ∈ Q and A∩B,A \B,B \A,M \ (A∪B) have the samecardinality as their primed-counterparts, then 〈M ′,A′,B ′〉 ∈ Q

8 / 38

Semantic Universals The Learning Model Experiments Conclusion

Quantity

Key intuition: determiners denote general relations between theirrestrictor and nuclear scope, not ones that depend on the identitiesof particular elements of the domain

Q is permutation-closed iff:if 〈M,A,B〉 ∈ Q and π is a permutation of M, thenπ(〈M,A,B〉) ∈ Q

Q is isomorphism-closed iff:if 〈M,A,B〉 ∈ Q and 〈M,A,B〉 ∼= 〈M ′,A′,B ′〉, then〈M ′,A′,B ′〉 ∈ Q

Q is quantitative:if 〈M,A,B〉 ∈ Q and A∩B,A \B,B \A,M \ (A∪B) have the samecardinality as their primed-counterparts, then 〈M ′,A′,B ′〉 ∈ Q

8 / 38

Semantic Universals The Learning Model Experiments Conclusion

Quantity

Key intuition: determiners denote general relations between theirrestrictor and nuclear scope, not ones that depend on the identitiesof particular elements of the domain

Q is permutation-closed iff:if 〈M,A,B〉 ∈ Q and π is a permutation of M, thenπ(〈M,A,B〉) ∈ Q

Q is isomorphism-closed iff:if 〈M,A,B〉 ∈ Q and 〈M,A,B〉 ∼= 〈M ′,A′,B ′〉, then〈M ′,A′,B ′〉 ∈ Q

Q is quantitative:if 〈M,A,B〉 ∈ Q and A∩B,A \B,B \A,M \ (A∪B) have the samecardinality as their primed-counterparts, then 〈M ′,A′,B ′〉 ∈ Q

8 / 38

Semantic Universals The Learning Model Experiments Conclusion

Quantity

Keenan and Stavi 1986, p. 311:“Monomorphemic dets are logical [= permutation-closed].”

Peters and Westerstahl 2006, p. 330:“All lexical quantifier expressions in natural languages denote ISOM[= isomorphism-closed] quantifiers.”

9 / 38

Semantic Universals The Learning Model Experiments Conclusion

Quantity

Keenan and Stavi 1986, p. 311:“Monomorphemic dets are logical [= permutation-closed].”

Peters and Westerstahl 2006, p. 330:“All lexical quantifier expressions in natural languages denote ISOM[= isomorphism-closed] quantifiers.”

9 / 38

Semantic Universals The Learning Model Experiments Conclusion

Quantity Universal

Quantity Universal

All simple determiners are quantitative.

10 / 38

Semantic Universals The Learning Model Experiments Conclusion

Conservativity

Key intuition: the restrictor restricts what the determiner talks about

Many French people smoke cigarettes≡ Many French people are French people who smoke cigarettes

Q is conservative iff:〈M,A,B〉 ∈ Q if and only if 〈M,A,A ∩ B〉 ∈ Q

11 / 38

Semantic Universals The Learning Model Experiments Conclusion

Conservativity

Key intuition: the restrictor restricts what the determiner talks about

Many French people smoke cigarettes≡ Many French people are French people who smoke cigarettes

Q is conservative iff:〈M,A,B〉 ∈ Q if and only if 〈M,A,A ∩ B〉 ∈ Q

11 / 38

Semantic Universals The Learning Model Experiments Conclusion

Conservativity

Key intuition: the restrictor restricts what the determiner talks about

Many French people smoke cigarettes≡ Many French people are French people who smoke cigarettes

Q is conservative iff:〈M,A,B〉 ∈ Q if and only if 〈M,A,A ∩ B〉 ∈ Q

11 / 38

Semantic Universals The Learning Model Experiments Conclusion

Conservativity Universal

Conservativity Universal (Barwise and Cooper 1981)

All simple determiners are conservative.

12 / 38

Semantic Universals The Learning Model Experiments Conclusion

Explaining Universals

Natural Question

Why do the universals hold? What explains the limited range ofquantifiers expressed by simple determiners?

13 / 38

Semantic Universals The Learning Model Experiments Conclusion

Explaining Universals

Natural Question

Why do the universals hold? What explains the limited range ofquantifiers expressed by simple determiners?

Answer 1: learnability.(Barwise and Cooper 1981; Keenan and Stavi 1986; Szabolcsi 2010)

The universals greatly restrict the search space that a languagelearner must explore when learning the meanings of determiners.This makes it easier (possible?) for them to learn such meaningsfrom relatively small input.[Compare: Poverty of the Stimulus argument for UG.]

In a sense must be true, but:“Likely, the unrestricted space has many hypotheses which are soimplausible, they can be ignored quickly and do not affect learning.The hard part of learning, may be choosing between the plausiblecompetitor meanings, not in weeding out a large space of potentialmeanings.” Piantadosi 2013, p. 22

13 / 38

Semantic Universals The Learning Model Experiments Conclusion

Explaining Universals

Natural Question

Why do the universals hold? What explains the limited range ofquantifiers expressed by simple determiners?

Answer 1: learnability.(Barwise and Cooper 1981; Keenan and Stavi 1986; Szabolcsi 2010)

The universals greatly restrict the search space that a languagelearner must explore when learning the meanings of determiners.This makes it easier (possible?) for them to learn such meaningsfrom relatively small input.[Compare: Poverty of the Stimulus argument for UG.]

In a sense must be true, but:“Likely, the unrestricted space has many hypotheses which are soimplausible, they can be ignored quickly and do not affect learning.The hard part of learning, may be choosing between the plausiblecompetitor meanings, not in weeding out a large space of potentialmeanings.” Piantadosi 2013, p. 22

13 / 38

Semantic Universals The Learning Model Experiments Conclusion

Explaining Universals

Natural Question

Why do the universals hold? What explains the limited range ofquantifiers expressed by simple determiners?

Answer 1: learnability.(Barwise and Cooper 1981; Keenan and Stavi 1986; Szabolcsi 2010)

The universals greatly restrict the search space that a languagelearner must explore when learning the meanings of determiners.This makes it easier (possible?) for them to learn such meaningsfrom relatively small input.[Compare: Poverty of the Stimulus argument for UG.]

In a sense must be true, but:“Likely, the unrestricted space has many hypotheses which are soimplausible, they can be ignored quickly and do not affect learning.The hard part of learning, may be choosing between the plausiblecompetitor meanings, not in weeding out a large space of potentialmeanings.” Piantadosi 2013, p. 22

13 / 38

Semantic Universals The Learning Model Experiments Conclusion

Explaining Universals

Natural Question

Why do the universals hold? What explains the limited range ofquantifiers expressed by simple determiners?

Answer 2: learnability.(Peters and Westerstahl 2006)

The universals aid learnability because quantifiers satisfying theuniversals are easier to learn than those that do not.

Challenge: provide a model of learning which makes good on thispromise.[See Tiede 1999; Gierasimczuk 2007; Gierasimczuk 2009 for earlierattempts.]

13 / 38

Semantic Universals The Learning Model Experiments Conclusion

Explaining Universals

Natural Question

Why do the universals hold? What explains the limited range ofquantifiers expressed by simple determiners?

Answer 2: learnability.(Peters and Westerstahl 2006)

The universals aid learnability because quantifiers satisfying theuniversals are easier to learn than those that do not.

Challenge: provide a model of learning which makes good on thispromise.[See Tiede 1999; Gierasimczuk 2007; Gierasimczuk 2009 for earlierattempts.]

13 / 38

Semantic Universals The Learning Model Experiments Conclusion

Explaining Universals

Natural Question

Why do the universals hold? What explains the limited range ofquantifiers expressed by simple determiners?

Answer 2: learnability.(Peters and Westerstahl 2006)

The universals aid learnability because quantifiers satisfying theuniversals are easier to learn than those that do not.

Challenge: provide a model of learning which makes good on thispromise.[See Tiede 1999; Gierasimczuk 2007; Gierasimczuk 2009 for earlierattempts.]

13 / 38

Semantic Universals The Learning Model Experiments Conclusion

Overview

1 Semantic Universals

2 The Learning Model

3 Experiments

4 Conclusion

14 / 38

Semantic Universals The Learning Model Experiments Conclusion

Neural Networks

Source: Nielsen, “Neural Networks and Deep Learning”, Determination Press

15 / 38

Semantic Universals The Learning Model Experiments Conclusion

Gradient Descent and Back-propogation

The learning framework will be a non-convex optimization problem.

A total loss function, which will be the mean of a ‘local’ errorfunction:

L =1

N

∑`(yi , yi )

=1

N

∑`(NN(~θ, xi ), yi )

Navigate the landscape of weight space towards lower loss:

~θt+1 ← ~θt − α∇~θL

Back-propagation: calculate `(·, ·) after a forward-pass of thenetwork. ‘Propagate’ the error backwards through the network tocalculate the gradient.

16 / 38

Semantic Universals The Learning Model Experiments Conclusion

Gradient Descent and Back-propogation

The learning framework will be a non-convex optimization problem.

A total loss function, which will be the mean of a ‘local’ errorfunction:

L =1

N

∑`(yi , yi )

=1

N

∑`(NN(~θ, xi ), yi )

Navigate the landscape of weight space towards lower loss:

~θt+1 ← ~θt − α∇~θL

Back-propagation: calculate `(·, ·) after a forward-pass of thenetwork. ‘Propagate’ the error backwards through the network tocalculate the gradient.

16 / 38

Semantic Universals The Learning Model Experiments Conclusion

Gradient Descent and Back-propogation

The learning framework will be a non-convex optimization problem.

A total loss function, which will be the mean of a ‘local’ errorfunction:

L =1

N

∑`(yi , yi )

=1

N

∑`(NN(~θ, xi ), yi )

Navigate the landscape of weight space towards lower loss:

~θt+1 ← ~θt − α∇~θL

Back-propagation: calculate `(·, ·) after a forward-pass of thenetwork. ‘Propagate’ the error backwards through the network tocalculate the gradient.

16 / 38

Semantic Universals The Learning Model Experiments Conclusion

Recurrent Neural Networks

Source: Olah, “Understanding LSTM Networks”, http://colah.github.io/posts/2015-08-Understanding-LSTMs/

17 / 38

Semantic Universals The Learning Model Experiments Conclusion

Long Short-Term Memory

Introduced in Hochreiter and Schmidhuber 1997 to solve theexploding/vanishing gradients problem.

Source: Olah, “Understanding LSTM Networks”, http://colah.github.io/posts/2015-08-Understanding-LSTMs/

18 / 38

Semantic Universals The Learning Model Experiments Conclusion

The Learning Task

Our data will be the following:

Input: 〈Q,M〉 pairs, presented sequentiallyOutput: T/F, depending on whether M ∈ Q

So, this will be a sequence classification task.

The loss function we minimize is cross-entropy. In our case, withy ∈ {0, 1}, very simple:

`(NN(~θ, x), y) = − ln(NN(~θ, x)y )

19 / 38

Semantic Universals The Learning Model Experiments Conclusion

The Learning Task

Our data will be the following:

Input: 〈Q,M〉 pairs, presented sequentiallyOutput: T/F, depending on whether M ∈ Q

So, this will be a sequence classification task.

The loss function we minimize is cross-entropy. In our case, withy ∈ {0, 1}, very simple:

`(NN(~θ, x), y) = − ln(NN(~θ, x)y )

19 / 38

Semantic Universals The Learning Model Experiments Conclusion

The Learning Task

Our data will be the following:

Input: 〈Q,M〉 pairs, presented sequentiallyOutput: T/F, depending on whether M ∈ Q

So, this will be a sequence classification task.

The loss function we minimize is cross-entropy. In our case, withy ∈ {0, 1}, very simple:

`(NN(~θ, x), y) = − ln(NN(~θ, x)y )

19 / 38

Semantic Universals The Learning Model Experiments Conclusion

Data Generation

Algorithm 1 Data Generation Algorithm

Inputs: max len, num data, quants

data← []while len(data) < num data do

N ∼ Unif ([max len])Q ∼ Unif (quants)cur seq← Unif ({A ∩ B,A \ B,B \ A,M \ (A ∪ B)} ,N)if 〈Q, cur seq〉 /∈ data then

data.append(generate point(〈Q, cur seq,Q ∈ cur seq?〉))end if

end whilereturn shuffle(data)

20 / 38

Semantic Universals The Learning Model Experiments Conclusion

Overview

1 Semantic Universals

2 The Learning Model

3 Experiments

4 Conclusion

21 / 38

Semantic Universals The Learning Model Experiments Conclusion

General Set-up

For each proposed semantic universal:

Choose a set of minimally different quantifiers that disagree withrespect to the universal

Run some number of trials of training an LSTM to learn thosequantifiers

Stop when:500 minibatch running mean total accuracy on test set is > 0.99; ormean probability assigned to correct truth-value on test set is > 0.99

Measure, for each quantifier, its convergence point:first i such that mean accuracy from i onward for Q is > 0.98

Code and Data Available At:

http://github.com/shanest/quantifier-rnn-learning

22 / 38

Semantic Universals The Learning Model Experiments Conclusion

General Set-up

For each proposed semantic universal:

Choose a set of minimally different quantifiers that disagree withrespect to the universal

Run some number of trials of training an LSTM to learn thosequantifiers

Stop when:500 minibatch running mean total accuracy on test set is > 0.99; ormean probability assigned to correct truth-value on test set is > 0.99

Measure, for each quantifier, its convergence point:first i such that mean accuracy from i onward for Q is > 0.98

Code and Data Available At:

http://github.com/shanest/quantifier-rnn-learning

22 / 38

Semantic Universals The Learning Model Experiments Conclusion

General Set-up

For each proposed semantic universal:

Choose a set of minimally different quantifiers that disagree withrespect to the universal

Run some number of trials of training an LSTM to learn thosequantifiers

Stop when:500 minibatch running mean total accuracy on test set is > 0.99; ormean probability assigned to correct truth-value on test set is > 0.99

Measure, for each quantifier, its convergence point:first i such that mean accuracy from i onward for Q is > 0.98

Code and Data Available At:

http://github.com/shanest/quantifier-rnn-learning

22 / 38

Semantic Universals The Learning Model Experiments Conclusion

General Set-up

For each proposed semantic universal:

Choose a set of minimally different quantifiers that disagree withrespect to the universal

Run some number of trials of training an LSTM to learn thosequantifiers

Stop when:500 minibatch running mean total accuracy on test set is > 0.99; ormean probability assigned to correct truth-value on test set is > 0.99

Measure, for each quantifier, its convergence point:first i such that mean accuracy from i onward for Q is > 0.98

Code and Data Available At:

http://github.com/shanest/quantifier-rnn-learning

22 / 38

Semantic Universals The Learning Model Experiments Conclusion

General Set-up

For each proposed semantic universal:

Choose a set of minimally different quantifiers that disagree withrespect to the universal

Run some number of trials of training an LSTM to learn thosequantifiers

Stop when:500 minibatch running mean total accuracy on test set is > 0.99; ormean probability assigned to correct truth-value on test set is > 0.99

Measure, for each quantifier, its convergence point:first i such that mean accuracy from i onward for Q is > 0.98

Code and Data Available At:

http://github.com/shanest/quantifier-rnn-learning

22 / 38

Semantic Universals The Learning Model Experiments Conclusion

General Set-up

For each proposed semantic universal:

Choose a set of minimally different quantifiers that disagree withrespect to the universal

Run some number of trials of training an LSTM to learn thosequantifiers

Stop when:500 minibatch running mean total accuracy on test set is > 0.99; ormean probability assigned to correct truth-value on test set is > 0.99

Measure, for each quantifier, its convergence point:first i such that mean accuracy from i onward for Q is > 0.98

Code and Data Available At:

http://github.com/shanest/quantifier-rnn-learning

22 / 38

Semantic Universals The Learning Model Experiments Conclusion

Experiment 1: Monotonicity

Monotonicity Universal (Barwise and Cooper 1981)

All simple determiners are monotone.

Quantifiers: ≥ 4, ≤ 4, = 4

23 / 38

Semantic Universals The Learning Model Experiments Conclusion

Experiment 1: Monotonicity

Monotonicity Universal (Barwise and Cooper 1981)

All simple determiners are monotone.

Quantifiers: ≥ 4, ≤ 4, = 4

23 / 38

Semantic Universals The Learning Model Experiments Conclusion

Monotonicity: Results

24 / 38

Semantic Universals The Learning Model Experiments Conclusion

Monotonicity: Results

25 / 38

Semantic Universals The Learning Model Experiments Conclusion

Monotonicity: Discussion

≥ 4 easier than = 4: X

≥ 4 easier than ≤ 4, and ≤ 4 not easier than = 4: ???

Upward monotone are generally cognitively easier than downward[Just and Carpenter 1971; Geurts 2003; Geurts and Slik 2005;Deschamps et al. 2015]= 4 is a conjunction of monotone quantifiers

A better measure of learning rate than convergence point?

26 / 38

Semantic Universals The Learning Model Experiments Conclusion

Monotonicity: Discussion

≥ 4 easier than = 4: X

≥ 4 easier than ≤ 4, and ≤ 4 not easier than = 4: ???

Upward monotone are generally cognitively easier than downward[Just and Carpenter 1971; Geurts 2003; Geurts and Slik 2005;Deschamps et al. 2015]= 4 is a conjunction of monotone quantifiers

A better measure of learning rate than convergence point?

26 / 38

Semantic Universals The Learning Model Experiments Conclusion

Monotonicity: Discussion

≥ 4 easier than = 4: X

≥ 4 easier than ≤ 4, and ≤ 4 not easier than = 4: ???

Upward monotone are generally cognitively easier than downward[Just and Carpenter 1971; Geurts 2003; Geurts and Slik 2005;Deschamps et al. 2015]

= 4 is a conjunction of monotone quantifiers

A better measure of learning rate than convergence point?

26 / 38

Semantic Universals The Learning Model Experiments Conclusion

Monotonicity: Discussion

≥ 4 easier than = 4: X

≥ 4 easier than ≤ 4, and ≤ 4 not easier than = 4: ???

Upward monotone are generally cognitively easier than downward[Just and Carpenter 1971; Geurts 2003; Geurts and Slik 2005;Deschamps et al. 2015]= 4 is a conjunction of monotone quantifiers

A better measure of learning rate than convergence point?

26 / 38

Semantic Universals The Learning Model Experiments Conclusion

Monotonicity: Discussion

≥ 4 easier than = 4: X

≥ 4 easier than ≤ 4, and ≤ 4 not easier than = 4: ???

Upward monotone are generally cognitively easier than downward[Just and Carpenter 1971; Geurts 2003; Geurts and Slik 2005;Deschamps et al. 2015]= 4 is a conjunction of monotone quantifiers

A better measure of learning rate than convergence point?

26 / 38

Semantic Universals The Learning Model Experiments Conclusion

Experiment 2: Quantity

Quantity Universal

All simple determiners are quantitative.

Quantifiers: ≥ 3, first3

27 / 38

Semantic Universals The Learning Model Experiments Conclusion

Experiment 2: Quantity

Quantity Universal

All simple determiners are quantitative.

Quantifiers: ≥ 3, first3

27 / 38

Semantic Universals The Learning Model Experiments Conclusion

Quantity: Results

28 / 38

Semantic Universals The Learning Model Experiments Conclusion

Quantity: Results

29 / 38

Semantic Universals The Learning Model Experiments Conclusion

Quantity: Discussion

Very promising initial results

Harder to learn an order-sensitive quantifier than one that is onlysensitive to quantity

30 / 38

Semantic Universals The Learning Model Experiments Conclusion

Quantity: Discussion

Very promising initial results

Harder to learn an order-sensitive quantifier than one that is onlysensitive to quantity

30 / 38

Semantic Universals The Learning Model Experiments Conclusion

Experiment 3: Conservativity

Conservativity Universal (Barwise and Cooper 1981)

All simple determiners are conservative.

Quantifiers: nall, nonly[Motivated by Hunter and Lidz 2013]

31 / 38

Semantic Universals The Learning Model Experiments Conclusion

Experiment 3: Conservativity

Conservativity Universal (Barwise and Cooper 1981)

All simple determiners are conservative.

Quantifiers: nall, nonly[Motivated by Hunter and Lidz 2013]

31 / 38

Semantic Universals The Learning Model Experiments Conclusion

Conservativity: Results

32 / 38

Semantic Universals The Learning Model Experiments Conclusion

Conservativity: Results

33 / 38

Semantic Universals The Learning Model Experiments Conclusion

Conservativity: Discussion

No way of ‘breaking the symmetry’ between A ∩ B and B \ A in thismodel

More boldly: conservativity as a syntactic/structural constraint, nota semantic universal[See Fox 2002; Sportiche 2005; Romoli 2015]

34 / 38

Semantic Universals The Learning Model Experiments Conclusion

Conservativity: Discussion

No way of ‘breaking the symmetry’ between A ∩ B and B \ A in thismodel

More boldly: conservativity as a syntactic/structural constraint, nota semantic universal[See Fox 2002; Sportiche 2005; Romoli 2015]

34 / 38

Semantic Universals The Learning Model Experiments Conclusion

Overview

1 Semantic Universals

2 The Learning Model

3 Experiments

4 Conclusion

35 / 38

Semantic Universals The Learning Model Experiments Conclusion

Towards Meeting the Challenge

Challenge: find a model of learnability on which quantifierssatisfying a universal are easier to learn than those that do not.

We showed how to train LSTM networks via backpropagation toverify quantifiers.

Initial experiments show that this setup may be able to meet thechallenge.

36 / 38

Semantic Universals The Learning Model Experiments Conclusion

Towards Meeting the Challenge

Challenge: find a model of learnability on which quantifierssatisfying a universal are easier to learn than those that do not.

We showed how to train LSTM networks via backpropagation toverify quantifiers.

Initial experiments show that this setup may be able to meet thechallenge.

36 / 38

Semantic Universals The Learning Model Experiments Conclusion

Towards Meeting the Challenge

Challenge: find a model of learnability on which quantifierssatisfying a universal are easier to learn than those that do not.

We showed how to train LSTM networks via backpropagation toverify quantifiers.

Initial experiments show that this setup may be able to meet thechallenge.

36 / 38

Semantic Universals The Learning Model Experiments Conclusion

Future Work

More and larger experiments

Different models, including baselines for this task

More realistic visual input (see Sorodoc, Lazaridou, et al. 2016;Sorodoc, Pezzelle, et al. 2017)

Tools to ‘look inside’ the black box and see what the network isdoing

37 / 38

Semantic Universals The Learning Model Experiments Conclusion

Future Work

More and larger experiments

Different models, including baselines for this task

More realistic visual input (see Sorodoc, Lazaridou, et al. 2016;Sorodoc, Pezzelle, et al. 2017)

Tools to ‘look inside’ the black box and see what the network isdoing

37 / 38

Semantic Universals The Learning Model Experiments Conclusion

Future Work

More and larger experiments

Different models, including baselines for this task

More realistic visual input (see Sorodoc, Lazaridou, et al. 2016;Sorodoc, Pezzelle, et al. 2017)

Tools to ‘look inside’ the black box and see what the network isdoing

37 / 38

Semantic Universals The Learning Model Experiments Conclusion

Future Work

More and larger experiments

Different models, including baselines for this task

More realistic visual input (see Sorodoc, Lazaridou, et al. 2016;Sorodoc, Pezzelle, et al. 2017)

Tools to ‘look inside’ the black box and see what the network isdoing

37 / 38

Semantic Universals The Learning Model Experiments Conclusion

The End

Thank you!

38 / 38

Recommended