Optimal Ambiguity Packing in Context-Free Parsers with Interleaved Unification

Optimal Ambiguity Packing in Context-Free Parsers with

Interleaved Unification

Alon LavieCarnegie Mellon University

andCarolyn Penstein Rosé

University of Pittsburgh

Outline

• CF Parsers with Interleaved Unification• The Problem: Packing with Interleaved

Unification• The Rule Prioritization Heuristic• Why is the Heuristic Optimal?• Experimental Evaluation• Discussion and Conclusions

Unification-Augmented CFGs

• CFGs can be parsed efficiently (cubic time)• Unification-based grammars (i.e. HPSG) are more

difficult to efficiently parse• Unification-augmented CFGs are a good

compromise: – context-free backbone grammar– rules augmented with unification constraints– parsing produces a c-structure and f-structure

Unification-augmented CFG: Example

(<DECL> <--> (<NP> <VP>) (((x2 agr) = (x1 agr)) ((x0 subject) = x1) ((x2 form) = *finite) (x0 = x2)))

CF Parsing with Interleaved Unification

• f-structure computation is interleaved with the context-free c-structure computation

• unification of functional constraints associated with a rule applied whenever the parser completes a constituent according to the rule

• if parsing is bottom-up: the f-structure of the LHS constituent computed from the f-structures of the RHS constituents

• if unification fails - the rule fails and LHS constituent is pruned from further consideration

Local Ambiguity Packing

• NL grammars are often highly ambiguous• Number of parses as a function of sentence length may be

exponential• a Local Ambiguity: a portion of the input that can be

analyzed as a particular grammar category in multiple ways

• Local Ambiguity Packing: the multiple sub-parses are stored in a common data-structure indexed by a single pointer. The parser can refer to the entire set of sub-parses using this pointer

Utilizing Local Ambiguity Packing

• Parsing algorithm must be able to detect all local ambiguities and pack them together

• Some parsing algorithms are better suited for local ambiguity packing:– Tabular parsing algorithms synchronize processing so that local

ambiguities are easy to identify– GLR is not capable of performing full ambiguity packing: only

constituents in same state contexts– Differences in packing effectiveness may account for conflicting

evidence on parsing efficiency of Chart parsing versus GLR parsing

The Problem: Ambiguity Packing with Interleaved Unification

• Most CF parsing algorithms are under-specified in terms of how to pursue multiple analyses– Parsing actions of different ambiguities may be arbitrarily

interleaved– in Chart Parsing: which inactive edge should be picked

next from the agenda?– In GLR Parsing: which of multiple reduce actions should

be picked to perform next.– The particular order of parsing actions determines if and

when local ambiguities are detected

The Problem: Ambiguity Packing with Interleaved Unification

• A new local ambiguity may be detected after the packed constituent has been further processed

• with pure CF parsing - just pack the new analysis into the existing packed node

• Problem with unification - the f-structures have already been computed, must be re-computed

• Alternatively - do not pack, create a new node• Our Goal: order the parsing actions so that local

ambiguities are detected prior to the parse node being further processed.

Example: GLR Parsing

• In GLR parsing - choice of which reduction to perform next• Assume we just performed a reduction by rule R0:[A --> B

C] creating a constituent A: (4,7)• Assume we have a choice between the following rule

reductions:– R1:[D --> A], reducing the recent A to D: (4,7)– R2:[A --> E F], creating a new constituent A: (4,7)– R3:[G --> B A], reducing B and previous A to G: (3,7)

• Preferred choice: R2

– may allow packing new A with previous A

How to Prioritize the Rules?• Goal: find a fast rule ordering heuristic that can achieve

maximal ambiguity packing• Main idea: we wish to delay applying rules that further

process A until all other As of same span have been detected and packed.

• The Rightmost Criterion: select rule that creates a constituent with the rightmost starting position

• This is sufficient if grammar has no unary or epsilon rules!• Originally observed by Tomita and applied in GLR

implementation, but not published

Improved Heuristic for Unary Rules• With unary rules, rightmost is not enough:

– In our example: both R1 and R2 are rightmost, but R1 would further process the previous A before R2 detects a new local ambiguity

• We need to extend the heuristic to model the dependency between constituents in unary rules

• We define a partial order relation GE between constituents: – for every unary rule [A -->B] in the grammar, GE(A,B)– compute GE* - the transitive closure of GE

• Extended Heuristic: among rightmost rules, pick the one with the “GE-least” LHS category

Rule Ordering Heuristic for GLR

Input: a set of applicable grammar rule reductionsOutput: a selected grammar rule reduction to perform nextHeuristic:(1) For each potential grammar rule reduction, determine the

span and category of the resulting (reduced) constituent(2) Select the rule reduction that is rightmost - has the

greatest start position(3) If there are multiple rules reductions that are rightmost,

pick one that results in a category that is GE*-least.

Handling Epsilon Rules• Epsilon rules are still a problem:

– there may be non-unary rules that further process A and that are still rightmost• Problem is similar to unary rules and can be treated via a revised

partial order:1. Find all nullable symbols in grammar G

2. Define a revised partial order GEE(A,B):(a) if GE(A,B) then GEE(A,B)(b) for every rule [A --> B1 B2 … Bk] if all Bi are nullable, then for all i, GEE(A,Bi) if at most one Bi is not nullable, then GEE(A,Bi)(c) compute GEE* - the transitive closure if GEE

Rule Ordering Heuristic: Properties• The heuristic is extremely fast to apply at runtime• The GEE* partial order can be statically computed from the

grammar• It is possible for a grammar to have both GEE*(A,B) and

GEE*(B,A) - the grammar is cyclic, but unification may resolve the cycle

• This may result in sub-optimal ambiguity packing• Heuristic is best possible given just the static CF structure of the

grammar• More sophisticated tests are most likely not cost effective

computationally

Sketch of Optimality Proof• Assume it is not optimal• constituent A created, then B created using A, then another A of

same span created and not packed• assume second A not a result of processing first A• look at sequence of rules applied after B was created and until

second A was created• all of these constituents A, B, Xi have same span• according to definition of GEE*, GEE*(A,Xi)• also GEE*(B,A) thus GEE*(B,Xi)• at least one of the Xi was available when rule creating B was

selected, so B was not least.

Rule Prioritization in Chart Parsing

• The Agenda stores completed constituents waiting to be processed (used to extend active arcs)

• Ambiguity packing is done on items stored in the Agenda (thus, not yet further processed)

• Prioritize the order in which items are taken out from the Agenda

• Same criteria: rightmost and GEE*

Empirical Evaluations

• Two parsers: a GLR parser and a Chart parser• Both parsers also have robust versions - GLR* and

LCFlex - robust mode adds significant amounts of ambiguity

• Same LFG-style syntactic grammar• Grammar has 412 rules and 71 categories and

produces complete predicate-argument f-structure• GLR parsing table has 628 states and 8822 actions• Test set of 520 sentences from ESST domain

Results: Non-Robust Parsers

• Significant improvements in both number of parse nodes and parse times

• For sentences of length 12:– GLR: 12% less nodes, 21% less time– LC Parser: 40% less nodes, 21% less time



Results: Robust Parsers

• GLR* run with search beam of 30• LCFlex set to simulate same skipping behavior

of GLR*• Significant reductions in both number of parse

nodes and parsing times• For sentences of length 12:

– GLR*: 19% less nodes, 44% less time– LCFlex: 39% less nodes, 21% less time



Additional Independent Evaluation

• Conducted by Paul Placeway at CMU• Rule ordering heuristic incorporated into

independent parsing system for syntactic analysis of documentation manuals:– similar grammar formalism– different highly efficient Chart Parser with LC

predictions, grammar path compression– different grammar and test set

Additional Independent Evaluation: Results

condition CPU Gross Num Num time Memory Entries Arcs (sec) (kB)

Strawman 2463 690960 592589 406889

Rightmost 2231 603603 491087 357842 (10.4%) (14.5%) (20.7%) (13.7%)

Full >=* 2173 599310 483921 353197comp to r'most: (2.7%) (0.7%) (1.5%) (1.3%)comp to straw: (13.3%) (15.3%) (22.5%) (15.2%)

Further Issues

• Efficient packing of the f-structures– [Maxwell & Kaplan 91,93] [Miyao 99]

• Other strategies for combining CF parsing and unification:– sequential composition– multi-pass parsing, with partial/full unification

• Additional possible tie-breaking secondary ordering heuristics:– use a probabilistic model– apply a FIFO or “match the most recent” policy

Future Work

• Further investigate f-structure packing and multi-pass strategies

• Further development of the LCFlex Parser• Investigating the tight relationship between the

parser’s robustness features, search strategy and disambiguation mechanisms

Documents

Optimal Ambiguity Packing in Context-Free Parsers with Interleaved Unification