28
TDL Grammars in CSharp MA work supervised by Emily Bender MA work supervised by Emily Bender Glenn Slayden University of Washington DELPH IN Summit DELPHIN Summit July 2, 2010

Glenn Slayden University of Washington - DELPH-IN · TDL Grammars in C‐Sharp MA work supervised by Emily Bender Glenn Slayden University of Washington DELPH‐IN Summit July 2,

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Glenn Slayden University of Washington - DELPH-IN · TDL Grammars in C‐Sharp MA work supervised by Emily Bender Glenn Slayden University of Washington DELPH‐IN Summit July 2,

TDL Grammars in C‐SharpMA work supervised by Emily BenderMA work supervised by Emily Bender

Glenn SlaydenyUniversity of Washington

DELPH IN SummitDELPH‐IN SummitJuly 2, 2010

Page 2: Glenn Slayden University of Washington - DELPH-IN · TDL Grammars in C‐Sharp MA work supervised by Emily Bender Glenn Slayden University of Washington DELPH‐IN Summit July 2,

OverviewOverview

• thai‐language comthai language.com• Matrix grammar of ThaiC C C C• C#, CLR, CLI, BCL…

• Project progress– GLB calculation– DAG representation

• This work is at a very early stage: not presenting results todaypresenting results today

Page 3: Glenn Slayden University of Washington - DELPH-IN · TDL Grammars in C‐Sharp MA work supervised by Emily Bender Glenn Slayden University of Washington DELPH‐IN Summit July 2,

Another parser?Another parser?

“The [DELPH‐IN] community can always benefit from another parser”from another parser.

‐Dan Flickinger

Page 4: Glenn Slayden University of Washington - DELPH-IN · TDL Grammars in C‐Sharp MA work supervised by Emily Bender Glenn Slayden University of Washington DELPH‐IN Summit July 2,

GoalsGoals• Large‐scale processing of TDL grammars• Truly portable binaries between Windows, linux, and Mac

• Robust commercial grade platform• Robust, commercial grade platform• Easier to use, more accessible• Performance: Byte code JIT‐compiled to nativey p• Developer productivity: LINQ• GUI improvement wishlist

G i li ti t l– Grammar visualization tools– Printing– TDL editor

Page 5: Glenn Slayden University of Washington - DELPH-IN · TDL Grammars in C‐Sharp MA work supervised by Emily Bender Glenn Slayden University of Washington DELPH‐IN Summit July 2,

Thai‐language.comThai language.com

• Started 1997• Supervised Thai‐English 

lexicon of 50,000 entries• Message boards, reading 

exercises, lessons• Custom‐programmed siteCustom programmed site 

has followed the evolution of Windows Server 

h l i f ASPtechnologies from ASP to .NET 4

Page 6: Glenn Slayden University of Washington - DELPH-IN · TDL Grammars in C‐Sharp MA work supervised by Emily Bender Glenn Slayden University of Washington DELPH‐IN Summit July 2,

Matrix Grammar of ThaiMatrix Grammar of Thai• Emily’s Ling567 Grammar Engineering, Winter 2009 plus 

follow‐on workfollow‐on work• Phenomena modeled include simple treatments of:

– aspect auxiliaries– numeric‐classifiers– demonstrative‐classifier interaction– pro‐droppro drop– questions– adverbs, adjectives

l– copula– declarative and interrogative complementizer– negation

Page 7: Glenn Slayden University of Washington - DELPH-IN · TDL Grammars in C‐Sharp MA work supervised by Emily Bender Glenn Slayden University of Washington DELPH‐IN Summit July 2,

Current integration between Matrix grammar d h i land thai‐language.com

• TDL lexicon emitted• Testsuite coverage report

– http://www.thai‐language.com/testsuite‐results• LKB integration (spotty) uses LKB socket interface

– http://www.thai‐language.com/id/219435/parseMRS l b– MRS results bug

– No UTF8 support;  uses numeric lexicon– Frequent LKB rebootsFrequent LKB reboots

• Problems with Thai display in LKB, Emacs, and [incrtsdb()]

Page 8: Glenn Slayden University of Washington - DELPH-IN · TDL Grammars in C‐Sharp MA work supervised by Emily Bender Glenn Slayden University of Washington DELPH‐IN Summit July 2,

ECMA‐335: Common Language fInfrastructure

• Portable binaries! • Flagship programming language: C#• F#, Python, Ruby, VB, PowerShell, …

– http://en.wikipedia.org/wiki/List_of_CLI_Languages

• C++/CLI supports mixed mode• Fully managed: verifiable code garbage• Fully managed: verifiable code, garbage collection, strong guarantees

• Modern concurrency and threadingy g• 100% Unicode; outstanding legacy encoding support

Page 9: Glenn Slayden University of Washington - DELPH-IN · TDL Grammars in C‐Sharp MA work supervised by Emily Bender Glenn Slayden University of Washington DELPH‐IN Summit July 2,

Common Type System (CTS)Common Type System (CTS)

• Traditional OO paradigm:Traditional OO paradigm:– FieldsMembers– Members

– private/public

Si l bj t lti l i t f i h it• Single‐object, multiple‐interface inheritance• Polymorphism• Objects freely interoperate between languages and disparate applicationsg g p pp

Page 10: Glenn Slayden University of Washington - DELPH-IN · TDL Grammars in C‐Sharp MA work supervised by Emily Bender Glenn Slayden University of Washington DELPH‐IN Summit July 2,

C#C#

• Vaguely ANSI‐C syntaxVaguely ANSI C syntax• Like all CLI languages: strongly‐typed

i b• No pointers, but:– supports CLI value types – require no garbage 

ll f fcollection: important for performance– single‐indirection: value types and references can b d b fbe passed by reference

Page 11: Glenn Slayden University of Washington - DELPH-IN · TDL Grammars in C‐Sharp MA work supervised by Emily Bender Glenn Slayden University of Washington DELPH‐IN Summit July 2,

Developer ProductivityDeveloper Productivity

• “Intellisense” codeIntellisense  code completion

• Vast Base Class Libraries• Vast Base Class Libraries (BCL)S l d “ i ”• Strongly typed “generic” classes (templating)

Page 12: Glenn Slayden University of Washington - DELPH-IN · TDL Grammars in C‐Sharp MA work supervised by Emily Bender Glenn Slayden University of Washington DELPH‐IN Summit July 2,

LINQLINQ• Adds functional‐style programming to the i ti diimperative paradigm

• Functional operations on sequences:– aggregate, partition, set, project, join, restrict, select, generate, …

D f d ti• Deferred execution• Cross‐CLI, but most richly exposed in F# and C#C#

• Optional SQL‐like syntax can be used in C#

Page 13: Glenn Slayden University of Washington - DELPH-IN · TDL Grammars in C‐Sharp MA work supervised by Emily Bender Glenn Slayden University of Washington DELPH‐IN Summit July 2,

LINQ exampleString[] items = { "cat", "pear", "apple", "cat", "banana",

"pear", "pear", "apple" };

foreach (var pair in itemsforeach (var pair in items.GroupBy(k => k).OrderBy(grp => grp.Key).Select(grp => new

{word=grp.Key,count=grp.Count()

}))}))Console.WriteLine(pair.word + " " + pair.Count);

apple 2app ebanana 1cat 2pear 3

Page 14: Glenn Slayden University of Washington - DELPH-IN · TDL Grammars in C‐Sharp MA work supervised by Emily Bender Glenn Slayden University of Washington DELPH‐IN Summit July 2,

So farSo far

• Robust TDL parserRobust TDL parser• Builds type systemC l l G l• Calculates GLB closure

• Maximal inference for authored types

• DAG representationp– discussion follows

Page 15: Glenn Slayden University of Washington - DELPH-IN · TDL Grammars in C‐Sharp MA work supervised by Emily Bender Glenn Slayden University of Washington DELPH‐IN Summit July 2,

GLB calculationGLB calculationa := *top*.b := *top*. d := a & b.c := bc : b.e := a & c.f := a & c.

LKB result:12 edges12 edgesa‐f and a‐e are redundant

Page 16: Glenn Slayden University of Washington - DELPH-IN · TDL Grammars in C‐Sharp MA work supervised by Emily Bender Glenn Slayden University of Washington DELPH‐IN Summit July 2,

TGCS result*top*:

children: { a b }a:

children: { glbtype1 }b:

children: { glbtype1 c }d:

10 edgesnone redundantd:

c:children: { glbtype2 }

e:

none redundant

e:

f:

glbtype1:children: { glbtype2 d }

glbtype2:children: { f e }

Page 17: Glenn Slayden University of Washington - DELPH-IN · TDL Grammars in C‐Sharp MA work supervised by Emily Bender Glenn Slayden University of Washington DELPH‐IN Summit July 2,

GLB calculationGLB calculation

• Assign bit codes according to Ait‐Kaci (1989):Assign bit codes according to Ait Kaci (1989):– every type gets a unique bit positiontop == 1– top == ‐1

– code is logical ‘or’ of its child values (assign from leaf to top)leaf to top)

• glbtypes are given by non‐zero logical ‘and’f d• so far so good

Page 18: Glenn Slayden University of Washington - DELPH-IN · TDL Grammars in C‐Sharp MA work supervised by Emily Bender Glenn Slayden University of Washington DELPH‐IN Summit July 2,

bit twiddlingbit twiddling

Looking only at GLBs, we can directly obtain the oo g o y at G s, e ca d ect y obta t etransitive closure of the graph by carefully manipulating their parents and children. For each GLB:

1. Consider it as a parent: add its children1a. get a list of draft candidates: all descendants of 'node’ The order in which these are added is important because it determines the order of testing p gchildren in the next sub‐step. The list is needed because children may be eliminated from the list before the point at which they'd otherwise be addedbefore the point at which they d otherwise be added.

Page 19: Glenn Slayden University of Washington - DELPH-IN · TDL Grammars in C‐Sharp MA work supervised by Emily Bender Glenn Slayden University of Washington DELPH‐IN Summit July 2,

bit twiddling (cont.)bit twiddling (cont.)

1b. pick the subset of immediate children from this list. pWhile the list is not empty, add the oldest item as a child and then remove all of its descendants from the list.

2 Consider it as a child Add its parents2. Consider it as a child. Add its parents2a. get a list of draft candidates between 'node' and *top*p

2b. pick a minimal set of parents from this list.Each selection allows us to eliminate others from the li Thi ff i i i d b i i hlist. This effect is maximized by starting with candidates with the fewest additional bits set beyond the required match.

Page 20: Glenn Slayden University of Washington - DELPH-IN · TDL Grammars in C‐Sharp MA work supervised by Emily Bender Glenn Slayden University of Washington DELPH‐IN Summit July 2,

DAG Literature Review (selected)DAG Literature Review (selected)

• Structure‐sharing (Pereira 1985)Structure sharing (Pereira 1985)• Non‐destructive (Wroblewski 1987)Q i d i ( b hi 99 )• Quasi‐destructive (Tomabechi 1991)– Over‐copying– Early copying

• Thread safe (van Lohuizen 2000)– parallel unification– concurrent unification

Page 21: Glenn Slayden University of Washington - DELPH-IN · TDL Grammars in C‐Sharp MA work supervised by Emily Bender Glenn Slayden University of Washington DELPH‐IN Summit July 2,

DAG representationDAG representation

• DELPH‐IN TFS grammars have notable features:DELPH IN TFS grammars have notable features:– Fixed type hierarchy– A feature can only be introduced in exactly one y yplace in the type hierarchy

– When unification is successful, all features remain appropriate for their types

• Can we capitalize on these observations, plus i t iti b t th ffi i t f C# lintuition about the efficient use of C# value types to improve unification performance

Page 22: Glenn Slayden University of Washington - DELPH-IN · TDL Grammars in C‐Sharp MA work supervised by Emily Bender Glenn Slayden University of Washington DELPH‐IN Summit July 2,

“feature‐centric” TFS representationfeature centric  TFS representation

• Consider two types of TFS edges:Consider two types of TFS edges:– Feature‐appropriate‐for‐type

• These are fixed for the life of the grammar• Precomputed and associated with types

– Value of feature ‘F’ in TFS node of type ‘t0’ t i d b t ‘t1’constrained by type ‘t1’

• These edges are manipulated at runtime• All edges of this type are stored with feature ‘F’All edges of this type are stored with feature  F

• Successful unification (simple case): no edges are “created” or “destroyed”are  created  or  destroyed

Page 23: Glenn Slayden University of Washington - DELPH-IN · TDL Grammars in C‐Sharp MA work supervised by Emily Bender Glenn Slayden University of Washington DELPH‐IN Summit July 2,

triple descriptiontriple description

Page 24: Glenn Slayden University of Washington - DELPH-IN · TDL Grammars in C‐Sharp MA work supervised by Emily Bender Glenn Slayden University of Washington DELPH‐IN Summit July 2,

ExampleExample

t3 & [E t1, F t1].

Page 25: Glenn Slayden University of Washington - DELPH-IN · TDL Grammars in C‐Sharp MA work supervised by Emily Bender Glenn Slayden University of Washington DELPH‐IN Summit July 2,

Intention of this designIntention of this design

• ‘generic’ list datatypes maintain previously‐generic  list datatypes maintain previouslyallocated capacities

• by using generic lists of C# ‘value types’• by using generic lists of C#  value types  rampant DAG copying is expected to invoke minimal (or no) GC activityminimal (or no) GC activity

Page 26: Glenn Slayden University of Washington - DELPH-IN · TDL Grammars in C‐Sharp MA work supervised by Emily Bender Glenn Slayden University of Washington DELPH‐IN Summit July 2,

Continuing workContinuing work

• This development environment has very richThis development environment has very rich concurrency support: consider and seize upon opportunities that present themselvesopportunities that present themselves

• Availability of rich test cases (ERG, Thai matrix grammar) facilitates rapid developmentgrammar) facilitates rapid development

• Next step: parsing

Page 27: Glenn Slayden University of Washington - DELPH-IN · TDL Grammars in C‐Sharp MA work supervised by Emily Bender Glenn Slayden University of Washington DELPH‐IN Summit July 2,

References (1/2)• Hassan Ait‐Kaci, Robert Boyer, Patrick Lincoln, Roger Nasr. 1989. "Efficient 

Implementation of Lattice Operations"• Ulrich Callmeier. 2001. Efficient Parsing with Large‐Scale Unification 

Grammars MA Thesis Universität des Saarlandes ‐ FachrichtungGrammars. MA Thesis, Universität des Saarlandes FachrichtungInformatik. 

• Ulrich Callmeier. 2000. PET: a platform for experimentation with efficient HPSG processing techniques. Natural Language Engineering 6(1): 99‐107. 

d f l h h ll d b l f• Bernd Kiefer, Hans‐Ulrich Krieger, John Carroll, and Rob Malouf. 1999. A Bag of Useful Techniques for Efficient and robust Parsing. In Proceedings of the 37th annual meeting of the Association for Computational Linguistics. 473‐480 

• Robert Malouf, John Carroll, and Ann Copestake. 2000. Robert Malouf, John Carroll, and Ann Copestake. 2000. Effcient feature structure operations witout compilation. Natural Language Engineering, 1(1):1‐18. 

• Fernando C N Pereira 1985 A structure‐sharing representation forFernando C. N. Pereira. 1985. A structure sharing representation for unification‐based grammar formalisms. In Proceedings of the 23rd Annual Meeting of the Association for Computational Linguistics. Chicago, IL, 8‐12 July 1985, pages 137‐144. 

• Hideto Tomabechi 1991 Quasi destructive graph unification In• Hideto Tomabechi. 1991. Quasi‐destructive graph unification. In Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, CA. 

Page 28: Glenn Slayden University of Washington - DELPH-IN · TDL Grammars in C‐Sharp MA work supervised by Emily Bender Glenn Slayden University of Washington DELPH‐IN Summit July 2,

References (2/2)

• Hideto Tomabechi. 1992. Quasi‐destructive graph unifications with structure‐sharing. In Proceedings of the 15th International Conference on Computational Linguistics (COLING‐92), Nantes, France. p g ( ), ,

• Hideto Tomabechi. 1995. Design of efficient unification for natural language. Journal of Natural Language Processing, 2(2):23‐58. 

• Marcel P. van Lohuizen. 1999. Parallel processing of natural language parsers In PARCO '99parsers. In PARCO  99. 

• Marcel P. van Lohuizen. 2000. Exploiting parallelism in unification‐based parsing. In Proceedings of the Sixth International Workshop on Parsing Technologies (IWPT 2000), Trento, Italy. 

• Marcel P. van Lohuizen. 2000. Memory‐efficient and Thread‐safe Quasi‐Destructive Graph Unification. In Proceedings of the 38th Meeting of the Association for Computational Linguistics, Hong Kong, China, 2000. 

• Marcel P. van Lohuizen. 2001. A generic approach to parallel chart parsingMarcel P. van Lohuizen. 2001. A generic approach to parallel chart parsing with an application to LinGO. In Proceedings of the 39th Meeting of the Association for Computational Linguistics, Toulouse, France. 

• David A. Wroblewski. 1987. Nondestructive graph unification. In Proceedings of the 6th National Conference on Artificial Intelligence (AAAIProceedings of the 6th National Conference on Artificial Intelligence (AAAI‐87), 582‐589. Morgan Kaufmann.