21
1 Montague Grammar and MT Chris Brew, The Ohio State University http://www.purl.org/NET/ cbrew . htm

1 Montague Grammar and MT Chris Brew, The Ohio State University

Embed Size (px)

Citation preview

Page 1: 1 Montague Grammar and MT Chris Brew, The Ohio State University

1

Montague Grammar and MT

Chris Brew, The Ohio State University

http://www.purl.org/NET/cbrew.htm

Page 2: 1 Montague Grammar and MT Chris Brew, The Ohio State University

Montague Grammar and MT 2795V, Autumn 2005

Machine Translation and Montague Grammar

Great paper by Jan Landsbergen, in Readings in Machine Translation.

The place of linguistics in MT What is the essence of Montague Grammar? How can we use it (the essence) in MT? The subset problem How does this look today?

Page 3: 1 Montague Grammar and MT Chris Brew, The Ohio State University

Montague Grammar and MT 3795V, Autumn 2005

Possible translations

It must be defined clearly what the correct sentences of the source and target languages are. Linguistic theory provides means to do this by providing

grammars with associated compositional semantics Landsbergen suggests a Montague -(inspired) grammar

If the input is a correct source language sentence, the output should be a correct target language sentence. This is a condition on the design of the translation system. Landsbergen sketches one approach

There must be some definition of the information content that the source and target sentences should have in common This is a call to arms for translation theory No good solution is currently available

Page 4: 1 Montague Grammar and MT Chris Brew, The Ohio State University

Montague Grammar and MT 4795V, Autumn 2005

Best translations

It must be defined clearly what the correct sentences of the source and target languages are. This defines the search space of possible inputs and outputs

If the input is a correct source language sentence, the output should be the best corresponding target language sentence. The system will be evaluated on its treatment of correct sentences.

Robustness with respect to incorrect input is not required. It could be that there are three sentences e,f and e’ such that f is

the best translation of e but e’ is the best translation of f. ‘best translation’ is not a symmetric relation

By contrast, ‘possible translation’ is symmetric. In addition, if we have three languages E,F,G then we have

transitivitypossibleE-F possibleF-G = possibleE-G

Page 5: 1 Montague Grammar and MT Chris Brew, The Ohio State University

Montague Grammar and MT 5795V, Autumn 2005

Comparing MT systems

It is possible to reason theoretically about systems that at least aspire to Landsbergen’s principles

There are no obvious grammatical or semantic criteria for evaluating systems when the output is not even a correct sentence of the target language.

Linguists should specify the possible translations Engineers (or linguists wearing hard hats) should

worry about robustness and translation selection. The robustness part might need to appeal to world

knowledge, discourse history, knowledge of the task, other extralinguistic factors

Page 6: 1 Montague Grammar and MT Chris Brew, The Ohio State University

Montague Grammar and MT 6795V, Autumn 2005

The essence of Montague Grammar

There is a set of basic expressions with meanings

Rules are pairs of a syntactic and a semantic rule, where the syntactic and the semantic rules work in lock-step (Rule-to-rule hypothesis)

Either: the semantic rules are operators that build up the semantic value (Montagovian)

Or: the semantic rules build up an expression in some logic, then the expression is interpreted by the rules of the logic to produce a standardized semantic value (echt Montague)

Page 7: 1 Montague Grammar and MT Chris Brew, The Ohio State University

Montague Grammar and MT 7795V, Autumn 2005

Landsbergen’s system

M-grammars Have surface trees (S-trees). S-PARSER is

standard technology, generates parse forest of S-trees)

M-PARSER scans the results of S-PARSER, and applies a series of analytical rules to the S-trees rewriting them to produce surface trees. The M-PARSER is very powerful, and builds up semantic values.

The result of M-PARSER is a semantic tree that is easy to transfer.

Page 8: 1 Montague Grammar and MT Chris Brew, The Ohio State University

Montague Grammar and MT 8795V, Autumn 2005

The subset problem

Montague grammars translate natural language into subsets of intensional logic

There is no guarantee that the subset will be the same for every language

Without extra cleverness, the only sentences that can be translated will be those which are in the intersection of the source language IL and the target language IL

Page 9: 1 Montague Grammar and MT Chris Brew, The Ohio State University

Montague Grammar and MT 9795V, Autumn 2005

Isomorphic grammars

To avoid the subset problem, impose the constraint that For every syntactic rule in one language there is a

corresponding syntactic rule in every other language, and that the meaning operation is the same across the board

For every basic expression, there is a corresponding one in every other language

This is a really heavy constraint on grammar writers, and it isn’t clear how to do it

Page 10: 1 Montague Grammar and MT Chris Brew, The Ohio State University

Montague Grammar and MT 10795V, Autumn 2005

Grammar writing

A set of compositional rules R is written for handling a particular phenomenon in language L, a corresponding set of rules R’ is written for handling the corresponding phenomenon in language L’ (Landsbergen p250)

Grammar development proceeds in parallel. You test by ensuring that R covers the relevant expressions of L and R’ covers the relevant expressions of L’

The most important practical difference between this and other approaches is probably that the grammars are written with translation in mind.

Page 11: 1 Montague Grammar and MT Chris Brew, The Ohio State University

Montague Grammar and MT 11795V, Autumn 2005

The claim

If you do this grammar-writing co-ordination, you can get away without worrying about the subset problem

Montague grammar may be way too complicated but if Dutch geloven works the same as English believe you can, in that case, get away with the same theoretically insufficient representation on both sides

You might be able to control the consequencesof putting extra (non-truth functional) control information into the IL by doing this on a case-by-case basis in order to co-ordinate specific phenomena. (DANGER)

Page 12: 1 Montague Grammar and MT Chris Brew, The Ohio State University

Montague Grammar and MT 12795V, Autumn 2005

How does this look today?

Practical experience with broad-coverage grammars

We now know that broad-coverage grammars produce large numbers of analyses, most of them crazy.

It definitely pays to do some kind of probabilistic parse selection, even if you have a good broad-coverage grammar.

If your goal is to do well on existing parsing metrics, it works well to learn the grammar from a treebank.

Page 13: 1 Montague Grammar and MT Chris Brew, The Ohio State University

Montague Grammar and MT 13795V, Autumn 2005

•The linguistic question

Given a tree, tell me how to make a score for the tree out of smaller components

Page 14: 1 Montague Grammar and MT Chris Brew, The Ohio State University

Montague Grammar and MT 14795V, Autumn 2005

Given a tree

Tell me how to break it down into smaller components

Smaller components because these smaller components are going to be common enough that the statistics over them might be reliable

But large enough that the crucial relationships between the parts of the tree have a chance of coming through

Probabilistic context-free grammars are (slightly?) too coarse-grained.

So we adjust them in ways that bring out more of the crucial relationships. Add parents, grandparents, head-words, other clever

stuff

Page 15: 1 Montague Grammar and MT Chris Brew, The Ohio State University

Montague Grammar and MT 15795V, Autumn 2005

Given a translation pair

Tell me how to break it down into smaller components

Smaller components because these smaller components are going to be common enough that the statistics over them might be reliable

But large enough that the crucial relationships between the parts of the pair have a chance of coming through

Language model for TL, standard technology Models 1,2,3,4,5 for SL TL correspondence.

Clearly very coarse-grained How to adjust so that more of the crucial

relationships come through? How to think about translation pairs?

Page 16: 1 Montague Grammar and MT Chris Brew, The Ohio State University

Montague Grammar and MT 16795V, Autumn 2005

Errorfulness

PTB is smallish and somewhat errorful This imposes practical limits on the complexity

of models. The more detail you ask for, the less likely your training procedure is to provide it in reliable form.

Hand-written grammars blur the distinction between ungrammaticality and lack of coverage.

It is therefore dangerous for components that use grammars to give too much weight to the grammar’s claims about ungrammaticality

Even when the grammar fails to provide a complete analysis, it could provide useful partial results.

Page 17: 1 Montague Grammar and MT Chris Brew, The Ohio State University

Montague Grammar and MT 17795V, Autumn 2005

Errorfulness

Current word-aligned corpora are tiny, but do at least exist. Presumably they too are errorful.

Unsupervised learning via EM has dominated the field. This is because nothing better is available. The pseudo-annotation that EM hallucinates is very errorful.

The complexity of models is limited by the need to do EM and by the difficulty of working with errorful annotation.

It is dangerous for the system to believe hard-and-fast things about intertranslatability

Page 18: 1 Montague Grammar and MT Chris Brew, The Ohio State University

Montague Grammar and MT 18795V, Autumn 2005

Coverage

To score well, it usually pays to guess even if The question seems so stupid that no sensible answer

is possible Your answer would be little better than a random guess

Statistical parsers build up models of grammar that always make a guess

The models learn from the whole of the data. They might be designed to learn linguistic things, but they can and do implicitly learn non-linguistic things that turn out to help.

Page 19: 1 Montague Grammar and MT Chris Brew, The Ohio State University

Montague Grammar and MT 19795V, Autumn 2005

Coverage

To score well, it usually pays to guess even if The question seems so stupid that no sensible answer is

possible Your answer would be little better than a random guess

Brown-style MT systems have good coverage, and not-bad probabilistic models of <something>. They too learn from the whole of the data.

Their design is shaped partly by the need to model linguistic things (e.g. word order variation) partly by accidental success in modeling other factors that we don’t understand yet

Page 20: 1 Montague Grammar and MT Chris Brew, The Ohio State University

Montague Grammar and MT 20795V, Autumn 2005

Conclusions

There is clear parallel between Landsbergen’s notion of intertranslatability and Montague’s notion of grammaticality.

Arguably, statistical parsers succeed because they relax the notion of grammaticality, allowing them to handle misfires in the grammar smoothly. Co-incidentally, they finish up robust to other difficulties, including weaknesses in the statistical models and the training data.

Page 21: 1 Montague Grammar and MT Chris Brew, The Ohio State University

Montague Grammar and MT 21795V, Autumn 2005

Conclusions

There is clear parallel between Landsbergen’s notion of intertranslatability and Montague’s notion of grammaticality.

Arguably, MT systems succeed because they relax the notion of intertranslatability (or just fail to even have such a notion).

Co-incidentally, this makes them robust to failings in the statistical modeling, the data, and the procedures for data augmentation.

That said, it would be nice to have explicit semantics in MT systems