1 Developing a German grammar for analysis and generation using OpenCCG Ciprian Gerstenberger...

Preview:

Citation preview

1

Developing a German grammar for analysis and generation using OpenCCG

Ciprian GerstenbergerUniversity of Saarland

IGK Colloquium January 13th 2005

2

Outline

1. NLP environments: a comparison

2. The choice: OpenCCG

3. The formalism: MMCCG

4. The German grammar

5. Future work

3

Dialogue systems

building dialogue systems → linguistic resources

linguistic resources → tools for developing and maintaining

wide range of different NLP environments

⇒ Which is the most appropriate environment for our purposes?

4

NLP environments for dialogue systems

General requirements

both for analysis and generation multi-lingual easy domain reconfigurability

Requirements for NLG

realization of contextually sensitive utterances linguistically motivated control over flexible sentence realization

5

NLP environments for dialogue systems

Technical requirements

freely available well documented offering support when needed freely available resources (for German) efficient platform independent

6

NLP environments

KPML (Lisp): Systemic-Functional Grammar (SFG) OpenCCG (Java): Multi-Modal Combinatory Categorial Grammar

(MMCCG) Babel (Prolog): Head-Driven Phrase Structure Grammar (HPSG) LKB (Lisp): Head-Driven Phrase Structure Grammar (HPSG) XLE (C): Lexical Functional Grammar (LFG) XTAG (Lisp): Tree Adjoning Grammar (TAG) XDG (Oz): Topological Dependency Grammar (TDG)

7

NLP environments: Babel

Babel-System (S. Müller)

implementing HPSG Prolog only analysis, no generation multi-lingual (?) resources for German: grammar with good coverage freely available documentation support (?)

8

NLP environments: LKB

LKB

implementing HPSG Lisp multi-lingual both analysis and generation but: resources for German not usable for generation freely available documentation support (?)

9

NLP environments: XTAG

XTAG

implementing TAG Lisp both analysis and generation multi-lingual resources for German (DFKI ?) freely available documentation support (?)

10

NLP environments: XDG

XDG

implementing TDG Oz only analysis (generation as dependency parsing using TAGs) multi-lingual (?) resources for German (toy grammars) freely available documentation (?) support

11

NLP environments: KPML

KOMET-Penman Multilingual Linguistic resource development

implementing Systemic-Functional Grammar (SFG) Lisp multi-lingual flexible generation good sentence realization control only for generation, no parsing resources for German: grammar with good coverage freely available documentation and support

12

NLP environments: XLE

Xerox Linguistic Environment

implementing LFG C and Tcl/Tk multi-lingual both analysis and generation resources for German (not freely available) documentation support not freely available

13

NLP environments: OpenCCG

OpenCCG

implementing Multi-Modal Combinatory Categorial Grammar (MMCCG)

open source Java-based NLP library both analysis and generation multi-lingual no resources for German, but grammars for English freely available documentation support

14

NLP environments: The Choice

OpenCCG

Java-based NLP library → platform independent analysis and generation → uniform grammar resources multi-lingual → extendable used and in use in several other projects: FLIGHTS, COMIC, COSY supporting output format for TTS (e.g. APML) optimized sentence realization flexible generation sentence realization control

15

Basic formalism: CCG

Combinatory Categorial Grammar

lexicalized grammar formalism

lexical items are assigned syntactic categories

combinatory rules

16

MMCCG

Multi-Modal Combinatory Categorial Grammar

refining CCG by introducing means of controlling the application of combinatory rules

specifying modes on category forming operators (slashes)

making application of rules dependent on the slash mode

four basic modes governing different levels of associativity and permutativity

17

Example

Der Hund sieht die Katze.

18

Example (cont.)

Der Hund sieht die Katze.

19

Developing a German Grammar

joint work with Magdalena Wolska (DIALOG Project)

Desiderata uniform resources for analysis and generation covering all phenomena in our domains achieve more generality of the grammar than wrt

phenomena encountered in our (relatively small) corpora

20

Phenomena

Some phenomena in German

agreement position of the finite verb Topological Fields: controlling the Vorfeld complex sentences ambiguity controlling sentence realization

21

Lexical forms

22

Agreement

23

Agreement (cont.)

24

Agreement/Complex sentences

25

Clause types

Verb-initial clauses

• yes/no questions:Soll ich die den Titel zu der Liste hinzufügen?

• alternative questions:Möchtest Du Mozart oder Bach hören?

imperatives:Wähle das Album „Californication“ von den Red Hot Chili Peppers!

26

Clause types (cont.)

Verb-second clauses

main declarative: Der Titel wurde hinzugefügt.

• wh-question:Welcher Künstler spielt „Missunderstood“?

27

Clause types (cont.)

Verb-final clauses

subordinate clause:Wenn Sie möchten, kann ich „We Just Can´t Get Enough CCG“ abspielen.

relative clause:Ich nehme aus den ersten vier Alben, die du hast, jeweils den ersten Song.

• complement clause:Ich glaube, daß das Album „Dangerously In Love“ heißt.

28

Topological Fields

Controlling the Vorfeld occupation using flags

29

Topological Fields (cont.)

Controlling the Vorfeld occupation using flags

30

Analysis: Ambiguities

Der Hund von dem traurigen Mann den ich sah rennt.

31

Analysis: Ambiguities (cont.)

Das Kind rennt wenn der Hund rennt weil die Katze rennt.

32

Generation

Sentence realization without control

33

Generation (cont.)

Sentence realization with control: fronted subject

34

Generation (cont.)

Sentence realization with control: fronted object

35

Future Work (1)

extending the grammar wrt the two domain currently modelled (MP3 and maths tutorial)

(AP, NP, sentence, etc.) coordination complex NP (e.g. postmodifications) control and raising verbs particle verbs (Ich spiele den Song ab vs. Ich möchte den

Song abspielen) Topological Fields: scrambling in the Mittelfeld

36

Future Work (2)

analysis: coping with partial input, ill-formed utterances generation: realizing elliptical output using a dynamic morphological module development of an ontology

Recommended