Upload
walda
View
62
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Optimal Ambiguity Packing in Context-Free Parsers with Interleaved Unification. Alon Lavie Carnegie Mellon University and Carolyn Penstein Rosé University of Pittsburgh. Outline. CF Parsers with Interleaved Unification The Problem: Packing with Interleaved Unification - PowerPoint PPT Presentation
Citation preview
Optimal Ambiguity Packing in Context-Free Parsers with
Interleaved Unification
Alon LavieCarnegie Mellon University
andCarolyn Penstein Rosé
University of Pittsburgh
Outline
• CF Parsers with Interleaved Unification• The Problem: Packing with Interleaved
Unification• The Rule Prioritization Heuristic• Why is the Heuristic Optimal?• Experimental Evaluation• Discussion and Conclusions
Unification-Augmented CFGs
• CFGs can be parsed efficiently (cubic time)• Unification-based grammars (i.e. HPSG) are more
difficult to efficiently parse• Unification-augmented CFGs are a good
compromise: – context-free backbone grammar– rules augmented with unification constraints– parsing produces a c-structure and f-structure
Unification-augmented CFG: Example
(<DECL> <--> (<NP> <VP>) (((x2 agr) = (x1 agr)) ((x0 subject) = x1) ((x2 form) = *finite) (x0 = x2)))
CF Parsing with Interleaved Unification
• f-structure computation is interleaved with the context-free c-structure computation
• unification of functional constraints associated with a rule applied whenever the parser completes a constituent according to the rule
• if parsing is bottom-up: the f-structure of the LHS constituent computed from the f-structures of the RHS constituents
• if unification fails - the rule fails and LHS constituent is pruned from further consideration
Local Ambiguity Packing
• NL grammars are often highly ambiguous• Number of parses as a function of sentence length may be
exponential• a Local Ambiguity: a portion of the input that can be
analyzed as a particular grammar category in multiple ways
• Local Ambiguity Packing: the multiple sub-parses are stored in a common data-structure indexed by a single pointer. The parser can refer to the entire set of sub-parses using this pointer
Utilizing Local Ambiguity Packing
• Parsing algorithm must be able to detect all local ambiguities and pack them together
• Some parsing algorithms are better suited for local ambiguity packing:– Tabular parsing algorithms synchronize processing so that local
ambiguities are easy to identify– GLR is not capable of performing full ambiguity packing: only
constituents in same state contexts– Differences in packing effectiveness may account for conflicting
evidence on parsing efficiency of Chart parsing versus GLR parsing
The Problem: Ambiguity Packing with Interleaved Unification
• Most CF parsing algorithms are under-specified in terms of how to pursue multiple analyses– Parsing actions of different ambiguities may be arbitrarily
interleaved– in Chart Parsing: which inactive edge should be picked
next from the agenda?– In GLR Parsing: which of multiple reduce actions should
be picked to perform next.– The particular order of parsing actions determines if and
when local ambiguities are detected
The Problem: Ambiguity Packing with Interleaved Unification
• A new local ambiguity may be detected after the packed constituent has been further processed
• with pure CF parsing - just pack the new analysis into the existing packed node
• Problem with unification - the f-structures have already been computed, must be re-computed
• Alternatively - do not pack, create a new node• Our Goal: order the parsing actions so that local
ambiguities are detected prior to the parse node being further processed.
Example: GLR Parsing
• In GLR parsing - choice of which reduction to perform next• Assume we just performed a reduction by rule R0:[A --> B
C] creating a constituent A: (4,7)• Assume we have a choice between the following rule
reductions:– R1:[D --> A], reducing the recent A to D: (4,7)– R2:[A --> E F], creating a new constituent A: (4,7)– R3:[G --> B A], reducing B and previous A to G: (3,7)
• Preferred choice: R2
– may allow packing new A with previous A
How to Prioritize the Rules?• Goal: find a fast rule ordering heuristic that can achieve
maximal ambiguity packing• Main idea: we wish to delay applying rules that further
process A until all other As of same span have been detected and packed.
• The Rightmost Criterion: select rule that creates a constituent with the rightmost starting position
• This is sufficient if grammar has no unary or epsilon rules!• Originally observed by Tomita and applied in GLR
implementation, but not published
Improved Heuristic for Unary Rules• With unary rules, rightmost is not enough:
– In our example: both R1 and R2 are rightmost, but R1 would further process the previous A before R2 detects a new local ambiguity
• We need to extend the heuristic to model the dependency between constituents in unary rules
• We define a partial order relation GE between constituents: – for every unary rule [A -->B] in the grammar, GE(A,B)– compute GE* - the transitive closure of GE
• Extended Heuristic: among rightmost rules, pick the one with the “GE-least” LHS category
Rule Ordering Heuristic for GLR
Input: a set of applicable grammar rule reductionsOutput: a selected grammar rule reduction to perform nextHeuristic:(1) For each potential grammar rule reduction, determine the
span and category of the resulting (reduced) constituent(2) Select the rule reduction that is rightmost - has the
greatest start position(3) If there are multiple rules reductions that are rightmost,
pick one that results in a category that is GE*-least.
Handling Epsilon Rules• Epsilon rules are still a problem:
– there may be non-unary rules that further process A and that are still rightmost• Problem is similar to unary rules and can be treated via a revised
partial order:1. Find all nullable symbols in grammar G
2. Define a revised partial order GEE(A,B):(a) if GE(A,B) then GEE(A,B)(b) for every rule [A --> B1 B2 … Bk] if all Bi are nullable, then for all i, GEE(A,Bi) if at most one Bi is not nullable, then GEE(A,Bi)(c) compute GEE* - the transitive closure if GEE
Rule Ordering Heuristic: Properties• The heuristic is extremely fast to apply at runtime• The GEE* partial order can be statically computed from the
grammar• It is possible for a grammar to have both GEE*(A,B) and
GEE*(B,A) - the grammar is cyclic, but unification may resolve the cycle
• This may result in sub-optimal ambiguity packing• Heuristic is best possible given just the static CF structure of the
grammar• More sophisticated tests are most likely not cost effective
computationally
Sketch of Optimality Proof• Assume it is not optimal• constituent A created, then B created using A, then another A of
same span created and not packed• assume second A not a result of processing first A• look at sequence of rules applied after B was created and until
second A was created• all of these constituents A, B, Xi have same span• according to definition of GEE*, GEE*(A,Xi)• also GEE*(B,A) thus GEE*(B,Xi)• at least one of the Xi was available when rule creating B was
selected, so B was not least.
Rule Prioritization in Chart Parsing
• The Agenda stores completed constituents waiting to be processed (used to extend active arcs)
• Ambiguity packing is done on items stored in the Agenda (thus, not yet further processed)
• Prioritize the order in which items are taken out from the Agenda
• Same criteria: rightmost and GEE*
Empirical Evaluations
• Two parsers: a GLR parser and a Chart parser• Both parsers also have robust versions - GLR* and
LCFlex - robust mode adds significant amounts of ambiguity
• Same LFG-style syntactic grammar• Grammar has 412 rules and 71 categories and
produces complete predicate-argument f-structure• GLR parsing table has 628 states and 8822 actions• Test set of 520 sentences from ESST domain
Results: Non-Robust Parsers
• Significant improvements in both number of parse nodes and parse times
• For sentences of length 12:– GLR: 12% less nodes, 21% less time– LC Parser: 40% less nodes, 21% less time
Results: Non-Robust Parsers
Results: Non-Robust Parsers
Results: Robust Parsers
• GLR* run with search beam of 30• LCFlex set to simulate same skipping behavior
of GLR*• Significant reductions in both number of parse
nodes and parsing times• For sentences of length 12:
– GLR*: 19% less nodes, 44% less time– LCFlex: 39% less nodes, 21% less time
Results: Robust Parsers
Results: Robust Parsers
Additional Independent Evaluation
• Conducted by Paul Placeway at CMU• Rule ordering heuristic incorporated into
independent parsing system for syntactic analysis of documentation manuals:– similar grammar formalism– different highly efficient Chart Parser with LC
predictions, grammar path compression– different grammar and test set
Additional Independent Evaluation: Results
condition CPU Gross Num Num time Memory Entries Arcs (sec) (kB)
Strawman 2463 690960 592589 406889
Rightmost 2231 603603 491087 357842 (10.4%) (14.5%) (20.7%) (13.7%)
Full >=* 2173 599310 483921 353197comp to r'most: (2.7%) (0.7%) (1.5%) (1.3%)comp to straw: (13.3%) (15.3%) (22.5%) (15.2%)
Further Issues
• Efficient packing of the f-structures– [Maxwell & Kaplan 91,93] [Miyao 99]
• Other strategies for combining CF parsing and unification:– sequential composition– multi-pass parsing, with partial/full unification
• Additional possible tie-breaking secondary ordering heuristics:– use a probabilistic model– apply a FIFO or “match the most recent” policy
Future Work
• Further investigate f-structure packing and multi-pass strategies
• Further development of the LCFlex Parser• Investigating the tight relationship between the
parser’s robustness features, search strategy and disambiguation mechanisms