33
Why Syntax is Impossible Mike Dowman

Why Syntax is Impossible Mike Dowman. Syntax FLanguages have tens of thousands of words FSome combinations of words make valid sentences FOthers don’t

Embed Size (px)

Citation preview

Page 1: Why Syntax is Impossible Mike Dowman. Syntax FLanguages have tens of thousands of words FSome combinations of words make valid sentences FOthers don’t

Why Syntax is ImpossibleWhy Syntax is Impossible

Mike DowmanMike Dowman

Page 2: Why Syntax is Impossible Mike Dowman. Syntax FLanguages have tens of thousands of words FSome combinations of words make valid sentences FOthers don’t

SyntaxSyntax

Languages have tens of thousands of words

Some combinations of words make valid sentences

Others don’tNo one understands the grammar

of any language

Languages have tens of thousands of words

Some combinations of words make valid sentences

Others don’tNo one understands the grammar

of any language

Page 3: Why Syntax is Impossible Mike Dowman. Syntax FLanguages have tens of thousands of words FSome combinations of words make valid sentences FOthers don’t

Syntax is Complicated!Syntax is Complicated!

I saw Bill with Mary yesterday.You saw WHO with Mary yesterday?!Who did you see with Mary yesterday?

I saw Bill with Mary yesterday.You saw WHO with Mary yesterday?!Who did you see with Mary yesterday?

Page 4: Why Syntax is Impossible Mike Dowman. Syntax FLanguages have tens of thousands of words FSome combinations of words make valid sentences FOthers don’t

Syntax is Complicated!Syntax is Complicated!

I saw Bill with Mary yesterday.You saw WHO with Mary yesterday?!Who did you see with Mary yesterday?

I saw Bill and Mary yesterday.You saw WHO and Mary yesterday?!

I saw Bill with Mary yesterday.You saw WHO with Mary yesterday?!Who did you see with Mary yesterday?

I saw Bill and Mary yesterday.You saw WHO and Mary yesterday?!

Page 5: Why Syntax is Impossible Mike Dowman. Syntax FLanguages have tens of thousands of words FSome combinations of words make valid sentences FOthers don’t

Syntax is Complicated!Syntax is Complicated!

I saw Bill with Mary yesterday.You saw WHO with Mary yesterday?!Who did you see with Mary yesterday?

I saw Bill and Mary yesterday.You saw WHO and Mary yesterday?!Who did you see and Mary yesterday?

I saw Bill with Mary yesterday.You saw WHO with Mary yesterday?!Who did you see with Mary yesterday?

I saw Bill and Mary yesterday.You saw WHO and Mary yesterday?!Who did you see and Mary yesterday?

Page 6: Why Syntax is Impossible Mike Dowman. Syntax FLanguages have tens of thousands of words FSome combinations of words make valid sentences FOthers don’t

Generative GrammarGenerative Grammar

An explicit formal system that defines the set of valid sentences in a language

And maybe also explains what each one means

Generative grammar is the core research topic in linguistics

Includes strongly nativist theories and theories proposing that languages are primarily learned

An explicit formal system that defines the set of valid sentences in a language

And maybe also explains what each one means

Generative grammar is the core research topic in linguistics

Includes strongly nativist theories and theories proposing that languages are primarily learned

Page 7: Why Syntax is Impossible Mike Dowman. Syntax FLanguages have tens of thousands of words FSome combinations of words make valid sentences FOthers don’t

Grammar WritingGrammar Writing

Linguists take a selection of possible sentences

And obtain grammaticality judgments for those sentences

Then they produce a grammar that accounts for all the data

Linguists take a selection of possible sentences

And obtain grammaticality judgments for those sentences

Then they produce a grammar that accounts for all the data

Page 8: Why Syntax is Impossible Mike Dowman. Syntax FLanguages have tens of thousands of words FSome combinations of words make valid sentences FOthers don’t

Grammar CoverageGrammar Coverage

Linguists’ grammars only work for selected sentences

They can’t explain most naturally occurring sentences

The more data we consider the more surprising quirks of syntax that emerge

Linguists’ grammars only work for selected sentences

They can’t explain most naturally occurring sentences

The more data we consider the more surprising quirks of syntax that emerge

Page 9: Why Syntax is Impossible Mike Dowman. Syntax FLanguages have tens of thousands of words FSome combinations of words make valid sentences FOthers don’t

Children’s Language Acquisition

Children’s Language Acquisition

Kid’s observe a limited number of example sentences

But quickly internalize a system that correctly characterizes the whole language

Kid’s observe a limited number of example sentences

But quickly internalize a system that correctly characterizes the whole language

I-languageE-language LAD

Page 10: Why Syntax is Impossible Mike Dowman. Syntax FLanguages have tens of thousands of words FSome combinations of words make valid sentences FOthers don’t

How can kids do syntax when linguists can’t?

How can kids do syntax when linguists can’t?

Innate component of language (provided by genes)

Learned component of language (provided by language data)

Innate component of language (provided by genes)

Learned component of language (provided by language data)

Page 11: Why Syntax is Impossible Mike Dowman. Syntax FLanguages have tens of thousands of words FSome combinations of words make valid sentences FOthers don’t

How can kids do syntax when linguists can’t?

How can kids do syntax when linguists can’t?

Innate component of language (provided by genes)

Learned component of language (provided by language data)

Linguists have to infer bothChildren only the learned compo

nent

Innate component of language (provided by genes)

Learned component of language (provided by language data)

Linguists have to infer bothChildren only the learned compo

nent

Page 12: Why Syntax is Impossible Mike Dowman. Syntax FLanguages have tens of thousands of words FSome combinations of words make valid sentences FOthers don’t

Information TheoryInformation Theory

Both components of language must contain some amount of information

Data available to children must provide at least enough information as is in the learned component

This puts a limit on the complexity of the learned component of language

Both components of language must contain some amount of information

Data available to children must provide at least enough information as is in the learned component

This puts a limit on the complexity of the learned component of language

Page 13: Why Syntax is Impossible Mike Dowman. Syntax FLanguages have tens of thousands of words FSome combinations of words make valid sentences FOthers don’t

Linguists’ TaskLinguists’ Task

Linguists need to have at least as much information as is in the learned and innate components together

Can use data from multiple languages to try to characterize innate components

And can use positive and negative data

Linguists need to have at least as much information as is in the learned and innate components together

Can use data from multiple languages to try to characterize innate components

And can use positive and negative data

Page 14: Why Syntax is Impossible Mike Dowman. Syntax FLanguages have tens of thousands of words FSome combinations of words make valid sentences FOthers don’t

Correspondence to Linguistic Theories

Correspondence to Linguistic Theories

Small learned component = parameter setting

Large learned component = learned languages

Small innate component = general learning mechanism

Large innate component = universal grammar

Small learned component = parameter setting

Large learned component = learned languages

Small innate component = general learning mechanism

Large innate component = universal grammar

Page 15: Why Syntax is Impossible Mike Dowman. Syntax FLanguages have tens of thousands of words FSome combinations of words make valid sentences FOthers don’t

Size of Each ComponentSize of Each Component

Innate Component

small large huge

small learn = easy

ling = easy

learn = easy

ling = hard

learn = easy

ling = impossible

large learn = hard

ling = hard

learn = hard

ling = hard

learn = hard

ling = impossible

Learned

Component

huge learn = impossible

ling = impossible

learn = impossible

ling = impossible

learn = impossible

ling = impossible

Page 16: Why Syntax is Impossible Mike Dowman. Syntax FLanguages have tens of thousands of words FSome combinations of words make valid sentences FOthers don’t

Which component is large?

Which component is large?

As we haven’t yet managed to produce a generative grammar, at least one of innate or learned components must be large

Children learn relatively easily, so the learned component can’t be too big

As we haven’t yet managed to produce a generative grammar, at least one of innate or learned components must be large

Children learn relatively easily, so the learned component can’t be too big

Page 17: Why Syntax is Impossible Mike Dowman. Syntax FLanguages have tens of thousands of words FSome combinations of words make valid sentences FOthers don’t

Size of Each ComponentSize of Each Component

Innate Component

small large huge

small learn = easy

ling = easy

learn = easy

ling = hard

learn = easy

ling = impossible

large learn = hard

ling = hard

learn = hard

ling = hard

learn = hard

ling = impossible

Learned

Component

huge learn = impossible

ling = impossible

learn = impossible

ling = impossible

learn = impossible

ling = impossible

Page 18: Why Syntax is Impossible Mike Dowman. Syntax FLanguages have tens of thousands of words FSome combinations of words make valid sentences FOthers don’t

How big could the innate component be?

How big could the innate component be?

Genome contains 3 billion base pairs = 6 billion bits

Cell metabolism adds more information

Each base pair can be modified

Huge amount of information!

Genome contains 3 billion base pairs = 6 billion bits

Cell metabolism adds more information

Each base pair can be modified

Huge amount of information!

Page 19: Why Syntax is Impossible Mike Dowman. Syntax FLanguages have tens of thousands of words FSome combinations of words make valid sentences FOthers don’t

What could be in a huge innate component?

What could be in a huge innate component?

Not words forms - vary from language to language

Grammaticality patternsRules of syntax would be hugely

complex

Not words forms - vary from language to language

Grammaticality patternsRules of syntax would be hugely

complex

Page 20: Why Syntax is Impossible Mike Dowman. Syntax FLanguages have tens of thousands of words FSome combinations of words make valid sentences FOthers don’t

Impossibility of SyntaxImpossibility of Syntax

Grammaticality judgments on average can provide no more than one bit of information each

If syntax is hugely complex, there will be many grammars that are compatible with any given body of data

But all but one of these grammars would fail when tested on enough new data

Grammaticality judgments on average can provide no more than one bit of information each

If syntax is hugely complex, there will be many grammars that are compatible with any given body of data

But all but one of these grammars would fail when tested on enough new data

Page 21: Why Syntax is Impossible Mike Dowman. Syntax FLanguages have tens of thousands of words FSome combinations of words make valid sentences FOthers don’t

A Concrete ExampleA Concrete Example

A multi-agent modelEach agent has:innate componentlearned componentBoth are bit strings of fixed

lengthSentences are 100 bit strings

A multi-agent modelEach agent has:innate componentlearned componentBoth are bit strings of fixed

lengthSentences are 100 bit strings

Page 22: Why Syntax is Impossible Mike Dowman. Syntax FLanguages have tens of thousands of words FSome combinations of words make valid sentences FOthers don’t

Deciding on the Grammaticality of a Sentence 1

Deciding on the Grammaticality of a Sentence 1

Treat the sentence as a binary number Find:bi = s mod ni

bl = s mod nl

b is an index to a bit in the innate (bi) or learned (bl) component

n is the number of bits in the innate (ni) or learned (nl) component

s is the length of the sentences

Treat the sentence as a binary number Find:bi = s mod ni

bl = s mod nl

b is an index to a bit in the innate (bi) or learned (bl) component

n is the number of bits in the innate (ni) or learned (nl) component

s is the length of the sentences

Page 23: Why Syntax is Impossible Mike Dowman. Syntax FLanguages have tens of thousands of words FSome combinations of words make valid sentences FOthers don’t

Deciding on the Grammaticality of a Sentence 2

Deciding on the Grammaticality of a Sentence 2

A pseudo-random function maps from the two selected bits plus the sentence to a Boolean grammaticality judgment

It’s therefore typically necessary to know every bit of the sentence and both the innate and learned bits to predict the grammaticality of the sentence

Every bit counts

Usually about half of sentences are grammatical, half ungrammatical

A pseudo-random function maps from the two selected bits plus the sentence to a Boolean grammaticality judgment

It’s therefore typically necessary to know every bit of the sentence and both the innate and learned bits to predict the grammaticality of the sentence

Every bit counts

Usually about half of sentences are grammatical, half ungrammatical

Page 24: Why Syntax is Impossible Mike Dowman. Syntax FLanguages have tens of thousands of words FSome combinations of words make valid sentences FOthers don’t

4 Kinds of Agent4 Kinds of Agent

TeacherInnate: 10101000

Learned: 10010101

RelatedInnate: 10101000

Learned: 11110001

UnrelatedInnate: 10110101

Learned: 00111000

LinguistInnate: 00110100

Learned: 10001100

Page 25: Why Syntax is Impossible Mike Dowman. Syntax FLanguages have tens of thousands of words FSome combinations of words make valid sentences FOthers don’t

Learning by Related, Unrelated

Learning by Related, Unrelated

Observe a sentence from the teacher

Work out if it is grammatical according to current I-language

If not, invert the relevant bit of the learned component

Observe a sentence from the teacher

Work out if it is grammatical according to current I-language

If not, invert the relevant bit of the learned component

Page 26: Why Syntax is Impossible Mike Dowman. Syntax FLanguages have tens of thousands of words FSome combinations of words make valid sentences FOthers don’t

Grammar Inference by Linguists

Grammar Inference by Linguists

Choose random sentencesAsk the teacher if they are grammatic

alStore all sentences and grammaticali

ty judgmentsSearch for a setting of innate and lea

rned components that assigns the correct grammaticality rating to every sentence

Choose random sentencesAsk the teacher if they are grammatic

alStore all sentences and grammaticali

ty judgmentsSearch for a setting of innate and lea

rned components that assigns the correct grammaticality rating to every sentence

Page 27: Why Syntax is Impossible Mike Dowman. Syntax FLanguages have tens of thousands of words FSome combinations of words make valid sentences FOthers don’t

1,000 Bit Innate and Learned Components1,000 Bit Innate and

Learned Components

0.6

0.7

0.8

0.9

1

0 5000 10000 15000 20000

Number of Example Sentences

relatedunrelatedlinguist

Page 28: Why Syntax is Impossible Mike Dowman. Syntax FLanguages have tens of thousands of words FSome combinations of words make valid sentences FOthers don’t

1,000 Bit Innate Component 1,000,000 Bit Learned Component

1,000 Bit Innate Component 1,000,000 Bit Learned Component

0.6

0.7

0.8

0.9

1

0 5000 10000 15000 20000

Number of Example Sentences

relatedunrelatedlinguist

Page 29: Why Syntax is Impossible Mike Dowman. Syntax FLanguages have tens of thousands of words FSome combinations of words make valid sentences FOthers don’t

1,000,000 Bit Innate Component 1,000 Bit Learned Component

1,000,000 Bit Innate Component 1,000 Bit Learned Component

0.6

0.7

0.8

0.9

1

0 5000 10000 15000 20000

Number of Example Sentences

relatedunrelatedlinguist

Page 30: Why Syntax is Impossible Mike Dowman. Syntax FLanguages have tens of thousands of words FSome combinations of words make valid sentences FOthers don’t

Implications of Impossible Syntax

Implications of Impossible Syntax

A linguist can write a grammar that will adequately characterize any body of data

But it will fail when tested on new data

Partial grammars are not a stepping stone to complete generative grammars

A linguist can write a grammar that will adequately characterize any body of data

But it will fail when tested on new data

Partial grammars are not a stepping stone to complete generative grammars

Page 31: Why Syntax is Impossible Mike Dowman. Syntax FLanguages have tens of thousands of words FSome combinations of words make valid sentences FOthers don’t

A Universal Law of Generative Grammar

A Universal Law of Generative Grammar

Generative grammar is impossible if:

H(learned component) + H(innate component) > H(language data)

Unless we can use information from another source (genetic, neuroscientific, psycholinguistic)

Generative grammar is impossible if:

H(learned component) + H(innate component) > H(language data)

Unless we can use information from another source (genetic, neuroscientific, psycholinguistic)

Page 32: Why Syntax is Impossible Mike Dowman. Syntax FLanguages have tens of thousands of words FSome combinations of words make valid sentences FOthers don’t

Why do Syntax?Why do Syntax?

Studying generative grammar may tell us something about the human mind

It won’t help us build natural language processing systems

Is studying rare and obscure constructions the best way to do syntax?

Studying generative grammar may tell us something about the human mind

It won’t help us build natural language processing systems

Is studying rare and obscure constructions the best way to do syntax?

Page 33: Why Syntax is Impossible Mike Dowman. Syntax FLanguages have tens of thousands of words FSome combinations of words make valid sentences FOthers don’t

ConclusionConclusion

The idea that we can characterize a language by considering enough linguistic data is a hypothesis

It’s very unlikely that it’s possible to write a complete generative grammar

The idea that we can characterize a language by considering enough linguistic data is a hypothesis

It’s very unlikely that it’s possible to write a complete generative grammar