1 Montague Grammar and MT Chris Brew, The Ohio State University

1

Montague Grammar and MT

Chris Brew, The Ohio State University

http://www.purl.org/NET/cbrew.htm





Montague Grammar and MT 2795V, Autumn 2005

Machine Translation and Montague Grammar

Great paper by Jan Landsbergen, in Readings in Machine Translation.

The place of linguistics in MT What is the essence of Montague Grammar? How can we use it (the essence) in MT? The subset problem How does this look today?


Possible translations

It must be defined clearly what the correct sentences of the source and target languages are. Linguistic theory provides means to do this by providing

grammars with associated compositional semantics Landsbergen suggests a Montague -(inspired) grammar

If the input is a correct source language sentence, the output should be a correct target language sentence. This is a condition on the design of the translation system. Landsbergen sketches one approach

There must be some definition of the information content that the source and target sentences should have in common This is a call to arms for translation theory No good solution is currently available


Best translations

It must be defined clearly what the correct sentences of the source and target languages are. This defines the search space of possible inputs and outputs

If the input is a correct source language sentence, the output should be the best corresponding target language sentence. The system will be evaluated on its treatment of correct sentences.

Robustness with respect to incorrect input is not required. It could be that there are three sentences e,f and e’ such that f is

the best translation of e but e’ is the best translation of f. ‘best translation’ is not a symmetric relation

By contrast, ‘possible translation’ is symmetric. In addition, if we have three languages E,F,G then we have

transitivitypossibleE-F possibleF-G = possibleE-G


Comparing MT systems

It is possible to reason theoretically about systems that at least aspire to Landsbergen’s principles

There are no obvious grammatical or semantic criteria for evaluating systems when the output is not even a correct sentence of the target language.

Linguists should specify the possible translations Engineers (or linguists wearing hard hats) should

worry about robustness and translation selection. The robustness part might need to appeal to world

knowledge, discourse history, knowledge of the task, other extralinguistic factors


The essence of Montague Grammar

There is a set of basic expressions with meanings

Rules are pairs of a syntactic and a semantic rule, where the syntactic and the semantic rules work in lock-step (Rule-to-rule hypothesis)

Either: the semantic rules are operators that build up the semantic value (Montagovian)

Or: the semantic rules build up an expression in some logic, then the expression is interpreted by the rules of the logic to produce a standardized semantic value (echt Montague)


Landsbergen’s system

M-grammars Have surface trees (S-trees). S-PARSER is

standard technology, generates parse forest of S-trees)

M-PARSER scans the results of S-PARSER, and applies a series of analytical rules to the S-trees rewriting them to produce surface trees. The M-PARSER is very powerful, and builds up semantic values.

The result of M-PARSER is a semantic tree that is easy to transfer.


The subset problem

Montague grammars translate natural language into subsets of intensional logic

There is no guarantee that the subset will be the same for every language

Without extra cleverness, the only sentences that can be translated will be those which are in the intersection of the source language IL and the target language IL


Isomorphic grammars

To avoid the subset problem, impose the constraint that For every syntactic rule in one language there is a

corresponding syntactic rule in every other language, and that the meaning operation is the same across the board

For every basic expression, there is a corresponding one in every other language

This is a really heavy constraint on grammar writers, and it isn’t clear how to do it


Grammar writing

A set of compositional rules R is written for handling a particular phenomenon in language L, a corresponding set of rules R’ is written for handling the corresponding phenomenon in language L’ (Landsbergen p250)

Grammar development proceeds in parallel. You test by ensuring that R covers the relevant expressions of L and R’ covers the relevant expressions of L’

The most important practical difference between this and other approaches is probably that the grammars are written with translation in mind.


The claim

If you do this grammar-writing co-ordination, you can get away without worrying about the subset problem

Montague grammar may be way too complicated but if Dutch geloven works the same as English believe you can, in that case, get away with the same theoretically insufficient representation on both sides

You might be able to control the consequencesof putting extra (non-truth functional) control information into the IL by doing this on a case-by-case basis in order to co-ordinate specific phenomena. (DANGER)


How does this look today?

Practical experience with broad-coverage grammars

We now know that broad-coverage grammars produce large numbers of analyses, most of them crazy.

It definitely pays to do some kind of probabilistic parse selection, even if you have a good broad-coverage grammar.

If your goal is to do well on existing parsing metrics, it works well to learn the grammar from a treebank.


•The linguistic question

Given a tree, tell me how to make a score for the tree out of smaller components


Given a tree

Tell me how to break it down into smaller components

Smaller components because these smaller components are going to be common enough that the statistics over them might be reliable

But large enough that the crucial relationships between the parts of the tree have a chance of coming through

Probabilistic context-free grammars are (slightly?) too coarse-grained.

So we adjust them in ways that bring out more of the crucial relationships. Add parents, grandparents, head-words, other clever

stuff


Given a translation pair

Tell me how to break it down into smaller components

Smaller components because these smaller components are going to be common enough that the statistics over them might be reliable

But large enough that the crucial relationships between the parts of the pair have a chance of coming through

Language model for TL, standard technology Models 1,2,3,4,5 for SL TL correspondence.

Clearly very coarse-grained How to adjust so that more of the crucial

relationships come through? How to think about translation pairs?


Errorfulness

PTB is smallish and somewhat errorful This imposes practical limits on the complexity

of models. The more detail you ask for, the less likely your training procedure is to provide it in reliable form.

Hand-written grammars blur the distinction between ungrammaticality and lack of coverage.

It is therefore dangerous for components that use grammars to give too much weight to the grammar’s claims about ungrammaticality

Even when the grammar fails to provide a complete analysis, it could provide useful partial results.


Errorfulness

Current word-aligned corpora are tiny, but do at least exist. Presumably they too are errorful.

Unsupervised learning via EM has dominated the field. This is because nothing better is available. The pseudo-annotation that EM hallucinates is very errorful.

The complexity of models is limited by the need to do EM and by the difficulty of working with errorful annotation.

It is dangerous for the system to believe hard-and-fast things about intertranslatability


Coverage

To score well, it usually pays to guess even if The question seems so stupid that no sensible answer

is possible Your answer would be little better than a random guess

Statistical parsers build up models of grammar that always make a guess

The models learn from the whole of the data. They might be designed to learn linguistic things, but they can and do implicitly learn non-linguistic things that turn out to help.


Coverage

To score well, it usually pays to guess even if The question seems so stupid that no sensible answer is

possible Your answer would be little better than a random guess

Brown-style MT systems have good coverage, and not-bad probabilistic models of <something>. They too learn from the whole of the data.

Their design is shaped partly by the need to model linguistic things (e.g. word order variation) partly by accidental success in modeling other factors that we don’t understand yet


Conclusions

There is clear parallel between Landsbergen’s notion of intertranslatability and Montague’s notion of grammaticality.

Arguably, statistical parsers succeed because they relax the notion of grammaticality, allowing them to handle misfires in the grammar smoothly. Co-incidentally, they finish up robust to other difficulties, including weaknesses in the statistical models and the training data.


Conclusions

There is clear parallel between Landsbergen’s notion of intertranslatability and Montague’s notion of grammaticality.

Arguably, MT systems succeed because they relax the notion of intertranslatability (or just fail to even have such a notion).

Co-incidentally, this makes them robust to failings in the statistical modeling, the data, and the procedures for data augmentation.

That said, it would be nice to have explicit semantics in MT systems

Documents

1 Montague Grammar and MT Chris Brew, The Ohio State University