30
mputing Science, University of Aberdeen CS4025: Grammars Grammars for English Features Definite Clause Grammars See J&M Chapter 9 in 1 st ed, 12 in 2 nd , Prolog textbooks (for DCGs)

CS4025: Grammars

Embed Size (px)

DESCRIPTION

CS4025: Grammars. Grammars for English Features Definite Clause Grammars See J&M Chapter 9 in 1 st ed, 12 in 2 nd , Prolog textbooks (for DCGs). Grammar: Definition. The surface structure (syntax) of sentences is usually described by some kind of grammar . - PowerPoint PPT Presentation

Citation preview

Page 1: CS4025: Grammars

Computing Science, University of Aberdeen 1

CS4025: Grammars

Grammars for English Features Definite Clause Grammars

See J&M Chapter 9 in 1st ed, 12 in 2nd, Prolog textbooks (for DCGs)

Page 2: CS4025: Grammars

Computing Science, University of Aberdeen 2

Grammar: Definition

The surface structure (syntax) of sentences is usually described by some kind of grammar.

A grammar defines syntactically legal sentences.» John ate an apple (syn legal)» John ate apple (not syn legal)» John ate a building (syn legal)

More importantly, a grammar provides a description of the structure of a legal sentence

Page 3: CS4025: Grammars

Computing Science, University of Aberdeen 3

Facts about Grammars

Nobody has ever constructed a complete and correct grammar for any natural language

There is no single “correct” grammar for any natural language (though certain terminology has emerged that is generally useful)

Linguists even dispute exactly what formalism should be used for stating a grammar

Grammars describe subsets of NLs. They all have their strengths and weaknesses.

Page 4: CS4025: Grammars

Computing Science, University of Aberdeen 4

Sentence Constituents

Grammar rules mention constituents (e.g. NP – noun phrase) as well as single word types (noun)

A constituent can occur in many rules» NP also after a preposition (near the dog), in

possessives (the cat’s paws),... These capture to some extent ideas such as:

» Where NP occurs in a rule, any phrase of this class can appear

» NP’s have something in common in terms of their meaning

» Two phrases of the same kind, e.g. NP, can be joined together by the word “and” to yield a larger phrase of the same kind. Phrase structure tests.

Page 5: CS4025: Grammars

Computing Science, University of Aberdeen 5

Common Constituents

Some common constituents » Noun phrase: the dog, an apple, the man I saw yesterday» Adjective phrase: happy, very happy» Verb group: eat, ate, am eating, have been eating.» Verb phrase: eating an apple, ran into the store» Prepositional phrase: in the car» Sentence: A woman ate an apple.

Constituents named “X phrase” in general have a most important word, the head, of category X (underlined above). The head determines properties of the rest of the phrase.

Page 6: CS4025: Grammars

Computing Science, University of Aberdeen 6

Phrase-Structure Grammar

PS grammars use context-free rewrite rules to define constituents and sentences» S => NP Verb NP

Similar to rules used to define syntax of programming languages.» if-statement => if condition then statement

Also called context-free grammar (CFG) because the context does not restrict the application of the rules.

Adequate for (almost all of?) English» see J&M, chap 13

Page 7: CS4025: Grammars

Computing Science, University of Aberdeen 7

Another simple grammar

– S => NP VP– NP => Det Adj* Noun– NP => ProperNoun– VP => IntranVerb– VP => TranVerb NP– Det: a , the– Adj: big , black– Noun: dog , cat– ProperNoun: Fido , Misty, Sam– IntranVerb: runs, sleeps– TranVerb: chases, sees, saw

How does this assign structure to simple sentences?

Page 8: CS4025: Grammars

Computing Science, University of Aberdeen 8

Example: Sam sleeps

VP (VerbPhrase)

IntranVerb

Sam sleeps

Sentence

ProperNoun

NP (NounPhrase)

Rewrite/expansion rules.

Page 9: CS4025: Grammars

Computing Science, University of Aberdeen 9

Ex: The cat sees the dog

VP

Det Noun

TranVerb

cat sees the dog

Sentence

NP

The

Det Noun

NP

Page 10: CS4025: Grammars

Computing Science, University of Aberdeen 10

Fido chases the big black cat

VP

Det Adj

TranVerb

Fido chases the big

Sentence

NPProperNoun

NP

Adj

black

Noun

cat

Page 11: CS4025: Grammars

Computing Science, University of Aberdeen 11

Recursion

NL Grammars allow recursionNP = Det Adj* CommonNoun PP*PP = Prep NP

For example:The man in the store saw JohnI fixed the intake of the engine of BA’s first

jet.

Page 12: CS4025: Grammars

Computing Science, University of Aberdeen 12

The man in the store saw John

man

Sentence

PP

Det Noun

Prep

in the store

NP

The

Det Noun

NP

VP

[saw John]

Page 13: CS4025: Grammars

Computing Science, University of Aberdeen 13

Grammar is not everything

Sentences may be grammatically OK but not acceptable (acceptability?)» The sleepy table eats the green idea.» Fido chases the black big cat.

Sentences may be understandable even if grammatically flawed.» Fido chases a cat small.

BUT: Grammar terminology may be useful for

describing sentences even if they are flawed (e.g. in tutoring systems)

Grammars may be useful for analysing fragments within unrestricted and badly-formed text.

Page 14: CS4025: Grammars

Computing Science, University of Aberdeen 14

Features

Features are annotations on words and consituents Grammar rules use unification (as in Prolog) to

manipulate features. See references on unification, which is informally feature matching to the most common feature set.

Allow grammars to be smaller and simpler. If we accommodated all features in the above grammar, it would be very complex.

See J&M, chap 11 (1st edition), chap 15 (2nd edition)

Page 15: CS4025: Grammars

Computing Science, University of Aberdeen 15

Ex: Number

One feature is [number:N], where N is singular or plural.» A noun’s number comes from morphology (and

semantics)– cat - singular noun– cats - plural noun

» An NP’s number comes from its head noun» A VP’s number comes from its verb (aux verb?)

– is happy - singular VP– are happy - plural VP

» NP and VP must agree in number:– The rat has an injury– *The rat have an injury

Page 16: CS4025: Grammars

Computing Science, University of Aberdeen 16

Ex: Number

Similar issue between determiners and the head noun:» This tall man...» *This tall men...» *These tall man...» These tall men...

Languages can be richer (Slavic) or poorer (English, Chinese) morphologically.

Many different sorts of features in the world's languages.

Page 17: CS4025: Grammars

Computing Science, University of Aberdeen 17

Modified Grammar

» S => NP[num:N] VP[num:N]» NP[num:N] => Det[num:N] Adj* Noun[num:N]

PP*» NP[num:singular] => ProperNoun» VP[num:N] => IntranVerb[num:N]» VP[num:N] => TranVerb[num:N] NP» PP => Prep NP» Noun[num:sing]: dog» Noun[num:plur]: dogs etc.» ??Sentence features?

Page 18: CS4025: Grammars

Computing Science, University of Aberdeen 18

Example rule

NP[num:N] = Det[num:N] Adj* Noun[num:N] An NP’s determiner and noun (but not its adjectives

in English v Russian) must agree in number.– this dog is OK– these dogs is OK– this dogs is not OK (det must agree)– these dog is not OK (det must agree)– the big red dog is OK– the big red dogs is OK (adjs don’t agree)

[number of NP] is [number of Noun] Might also need some 'feature passing' in complex

structures.

Page 19: CS4025: Grammars

Computing Science, University of Aberdeen 19

Ex: These cats see this dog

VP[plur]

Det[sing] Noun[sing]

TranVerb[plur]

cats see this dog

Sentence[plur]

NP[sing]

These

Det[plural] Noun[plur]

NP[plur]

Page 20: CS4025: Grammars

Computing Science, University of Aberdeen 20

Ex: *This cats see this dog

VP[plur]

Det[sing] Noun[sing]

TranVerb[plur]

cats see this dog

*Sentence[]

NP[sing]

This

Det[sing] Noun[plur]

NP[??]

Page 21: CS4025: Grammars

Computing Science, University of Aberdeen 21

Other features

Person (I, you, him) Verb form (past, present, base, past participle,

pres participle) Gender (masc, fem, neuter) Polarity (positive, negative) etc, etc, etc

Page 22: CS4025: Grammars

Computing Science, University of Aberdeen 22

Definite Clause Grammars

A definite clause grammar (DCG) is a CFG with added feature information

The syntax is also Prolog-like, and DCGs are usually implemented in Prolog

You don’t need to know Prolog to use DCGs

The formalism was (partially) invented in Scotland

Page 23: CS4025: Grammars

Computing Science, University of Aberdeen 23

DCG Notation

Rule notation DCG notationS => NP VP s ---> np, vp.S[num:N] =>

NP[num:N] VP[num:N] s(Num) ---> np(Num), vp(Num).

N => dog n ---> [dog].N => New York n ---> [new, york].NP => <name> np ---> [X], {isname(X)}.

NB the number of “-” in the “--->” varies according to the implementation

Variables have to match, e.g. N, Num, X....

Page 24: CS4025: Grammars

Computing Science, University of Aberdeen 24

DCG Notation (2)

Use ---> between rule LHS and RHS. Non-terminal symbols (e.g., np) written as

Prolog constants (start with lower case letters) Features (eg, Num) are arguments in ()

» Features are shown by position, not name» Prolog variables (start with upper case letters)

indicate unknown values Actual words are lists, in [] Prolog tests can be attached in {} (Optional) use distinguished predicate to

define the goal symbol (eg, s).

Page 25: CS4025: Grammars

Computing Science, University of Aberdeen 25

An example DCG

distinguished(s).s ---> np(Num), vp(Num).np(Num) ---> det(Num), noun(Num).np(sing) ---> name.vp(Num) ---> intranverb(Num).vp(Num) ---> tranverb(Num), np(_).det(_) ---> [the].noun(sing) ---> [dog].name ---> [sam].intranverb(sing) ---> [sleeps].intranverb(plur) ---> [sleep].tranverb(sing) ---> [chases].tranverb(plur) ---> [chase].

Page 26: CS4025: Grammars

Computing Science, University of Aberdeen 26

DCG’s and Prolog

DCG’s are translated into Prolog in a fairly straightforward fashion» See Shapiro, The Art of Prolog

In fact, Prolog was originally designed as a language for NLP, as part of a machine-translation project.

Prolog has evolved since, but is still well-suited to many NLP operations.

Page 27: CS4025: Grammars

Computing Science, University of Aberdeen 27

Edinburgh Package

We will use a parsing system for DCGs developed (for teaching purposes) by Holt and Ritchie from Edinburgh University.

Use SWI Prolog

Page 28: CS4025: Grammars

Computing Science, University of Aberdeen 28

Using the package

Statements always end with “.” Prompt is ?- Type code into a file, not directly into the

interpreter. Load files with [file].

» .pl = prolog source By default, type “n.” to backtrack request

Page 29: CS4025: Grammars

Computing Science, University of Aberdeen 29

Using the DCG package

?- prp([sam, chases, the, dog]). parse and pretty print?- prp(np, [the, dog]). parse and print an np?- parse([sam, chases, the, dog], X). returns parse in X?- parse(np, [the, dog], X). parse a constituent (np)

For all the above, after each parse is printed, you will be asked if you want to look for additional parses

NB for other DCG implementations, the grammar writer explicitly builds parse structures in the grammar rules.

Page 30: CS4025: Grammars

Computing Science, University of Aberdeen 30

Other Grammar Formalisms

Many other grammar formalisms» GPSG, HPSG, LFG, TAG, ATN, GB,...

No consensus on which is best, although many strong opinions (like C vs Pascal vs Lisp)

Like programming languages, formalism used often determined by experience of developers, and available support tools